feat(compute): add ComputeStrategy interface with ECS Fargate backend#8
Conversation
236d7bd to
bad8867
Compare
There was a problem hiding this comment.
Pull request overview
Introduces a compute strategy abstraction for orchestrating agent sessions, extracting AgentCore-specific session invocation logic out of the shared orchestrator and updating the durable “start-session” step to delegate compute-specific work to the selected strategy.
Changes:
- Added a
ComputeStrategyinterface plusresolveComputeStrategy()factory keyed offblueprintConfig.compute_type. - Extracted AgentCore session invocation/stop logic into
AgentCoreComputeStrategy, and refactored orchestration to use it. - Added/updated unit tests for the new strategy + factory, and adjusted existing orchestrator-related tests accordingly.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
cdk/src/handlers/shared/compute-strategy.ts |
Adds strategy interface/types and a factory for selecting a compute backend. |
cdk/src/handlers/shared/strategies/agentcore-strategy.ts |
Implements AgentCore-specific session start/poll/stop behind the strategy interface. |
cdk/src/handlers/shared/orchestrator.ts |
Removes AgentCore session start logic from the shared orchestrator. |
cdk/src/handlers/orchestrate-task.ts |
Updates the durable start-session step to resolve and use a compute strategy. |
cdk/test/handlers/shared/compute-strategy.test.ts |
Tests strategy resolution and unknown compute type behavior. |
cdk/test/handlers/shared/strategies/agentcore-strategy.test.ts |
Tests AgentCore strategy session start/poll/stop behavior with SDK mocking. |
cdk/test/handlers/orchestrate-task.test.ts |
Removes old startSession tests and adjusts mocks (now mostly orchestrator-focused). |
.github/workflows/build.yml |
Updates CI environment variables and disables several security/scanning tools via MISE_DISABLE_TOOLS. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thank you ! Couple of findings, will add each one of them as a separate comment. Regarding the CI failures, don't worry about it in this PR, i'll fix it separately |
|
Finding 1 — Dual source of truth for runtimeArn Category: Architecture What happens: constructor(options: { runtimeArn: string }) { Then in startSession, the method immediately overrides it from blueprintConfig: const runtimeArn = input.blueprintConfig.runtime_arn ?? this.runtimeArn; // source 2 Why it matters: return new AgentCoreComputeStrategy({ runtimeArn: blueprintConfig.runtime_arn }); So this.runtimeArn and input.blueprintConfig.runtime_arn are always the same value at construction time. The ?? fallback can never fire. This creates the false impression that two Recommendation: |
|
Finding 2 — Lost SDK client singleton Category: Architecture / Performance Before (main branch): After (PR): The factory resolveComputeStrategy() is called inside the durable execution step 'start-session' at orchestrate-task.ts:120. In durable execution, steps can replay on Lambda Why it matters: Recommendation: let _client: BedrockAgentCoreClient | undefined; |
|
Finding 3 — compute_type is an untyped string Category: Types The CDK construct already defines a strict union at blueprint.ts:50: readonly type?: 'agentcore' | 'ecs'; But the runtime types in BlueprintConfig and RepoConfig declare it as a bare string: // repo-config.ts:49 The factory switch in compute-strategy.ts then does: switch (computeType) { Why it matters:
Recommendation: // repo-config.ts export interface BlueprintConfig { // compute-strategy.ts |
|
Finding 4 — stopSession swallows all errors Category: Error handling async stopSession(handle: SessionHandle): Promise { The entire catch is a logger.warn — the error is swallowed unconditionally. Why it matters:
Logging at warn for all of these treats a billing leak the same as a benign no-op. Recommendation: |
|
Finding 5 — Missing logger.info('Session started') after full sequence Category: Observability Before (main branch — orchestrator.ts:340): After (PR — orchestrate-task.ts:120-133): The strategy itself logs 'AgentCore session invoked' at agentcore-strategy.ts:61, but that fires before the DDB transition and event emit. The original code logged after the entire Why it matters: Recommendation: logger.info('Session started', { task_id: taskId, session_id: handle.sessionId, strategy: handle.strategyType }); |
|
Finding 6 — No integration tests for the rewritten start-session step Category: Testing What was removed: What was added:
What's missing:
No test validates this composition. Specifically:
The old startSession function was monolithic and tested end-to-end. The refactored version split the logic across two layers but only tested the bottom layer. Recommendation: |
|
Finding 7 — pollSession() silently returns 'running' forever Category: Interface contract async pollSession(_handle: SessionHandle): Promise { The ComputeStrategy interface at compute-strategy.ts:38 declares: pollSession(handle: SessionHandle): Promise; This returns SessionStatus, a discriminated union of 'running' | 'completed' | 'failed'. The current implementation always returns 'running'. Why it matters: The existing orchestrator polling in orchestrator.ts:352-366 (pollTaskStatus) reads from DDB, which is the actual source of truth. So currently pollSession is dead code. But:
Recommendation:
|
|
@krokoko All 7 findings addressed in d68edd6. Summary: Finding 1 — Dual source of truth for runtimeArn: Removed Finding 2 — Lost SDK client singleton: Hoisted Finding 3 — compute_type is untyped string: Added Finding 4 — stopSession swallows all errors: Distinguished error types:
Finding 5 — Missing logger.info after full sequence: Added Finding 6 — No integration tests: Created
Finding 7 — pollSession returns 'running' forever: Now throws |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 20 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
cdk/src/stacks/agent.ts:257
hasEcsBlueprintis currently hard-coded to always be false (.some(() => false)), soneedsEcseffectively depends only onABCA_ENABLE_ECS. This makes the preceding comment (“Wire ECS infrastructure when any blueprint uses compute.type: 'ecs'”) inaccurate and will silently skip provisioning ECS even if a Blueprint is later configured withcompute.type: 'ecs'. Consider either removinghasEcsBlueprintentirely and documenting that ECS infra is env-flag driven, or implementing a real detection mechanism (e.g., tracking blueprint compute type in code or exposing it from theBlueprintconstruct).
new CfnOutput(this, 'RepoTableName', {
value: repoTable.table.tableName,
description: 'Name of the DynamoDB repo config table',
});
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Keep gitleaks/osv-scanner enabled in CI (only disable trivy/grype/semgrep) - Type ComputeStrategy.type and SessionHandle.strategyType as ComputeType - Trim/filter ECS_SUBNETS to handle whitespace and trailing commas - Handle undefined exit code in ECS pollSession (container never started) - Scope iam:PassRole to specific ECS task/execution role ARNs - Validate all-or-nothing ECS props in TaskOrchestrator constructor - Remove dead hasEcsBlueprint detection; document env-flag driven approach - Add comment noting strategy_type as additive event field
…AgentCore logic Introduce ComputeStrategy interface with SessionHandle/SessionStatus types and resolveComputeStrategy factory. Extract AgentCoreComputeStrategy from orchestrator.ts. Refactor orchestrate-task handler to use strategy pattern for session lifecycle (start/poll/stop). Pure refactor — no behavior change, identical CloudFormation output.
The mise install step downloads tools (trivy) from GitHub releases. Without GITHUB_TOKEN, unauthenticated requests hit the 60 req/hr rate limit, causing flaky CI failures.
Mise uses GITHUB_API_TOKEN (not GITHUB_TOKEN) for authenticated GitHub API requests when downloading aqua tools like trivy.
Trivy, grype, semgrep, osv-scanner, and gitleaks are only needed for security scanning tasks, not for the build/test/synth pipeline. Disable them via MISE_DISABLE_TOOLS to avoid GitHub API rate limits when mise tries to download them on every PR build.
- Keep gitleaks and osv-scanner enabled in CI build (only disable trivy/grype/semgrep which need GitHub API downloads) - Remove unused @aws-sdk/client-bedrock-agentcore mock from orchestrate-task.test.ts (SDK is no longer imported by orchestrator) - Update PR description to note additive strategy_type event field
1. Single source of truth for runtimeArn — removed constructor param,
strategy now reads exclusively from blueprintConfig.runtime_arn
2. Lazy singleton for BedrockAgentCoreClient — module-level shared
client avoids creating new TLS sessions per invocation
3. ComputeType union type ('agentcore' | 'ecs') with exhaustive switch
and never-pattern in resolveComputeStrategy
4. Differentiated error handling in stopSession — ResourceNotFoundException
(info), ThrottlingException/AccessDeniedException (error), others (warn)
5. Added logger.info('Session started') after full invoke+transition+event
sequence in orchestrate-task.ts
6. Added start-session-composition.test.ts with integration tests for
happy path, error path (failTask), and partial failure (transitionTask throws)
7. pollSession now throws NotImplementedError instead of returning stale
'running' status — clear signal for future developers
- Replace require() with ES import for BedrockAgentCoreClient mock - Fix import ordering in start-session-composition test
Wire ECS Fargate as a compute backend behind the existing ComputeStrategy interface, using the existing durable Lambda orchestrator. No separate stacks or Step Functions — ECS is a strategy option alongside AgentCore. Changes: - EcsComputeStrategy: startSession (RunTask), pollSession (DescribeTasks state mapping), stopSession (StopTask with graceful error handling) - EcsAgentCluster construct: ECS Cluster (container insights), Fargate task def (2 vCPU/4GB/ARM64), security group (TCP 443 egress only), CloudWatch log group, task role (DynamoDB, SecretsManager, Bedrock) - TaskOrchestrator: optional ECS props for env vars and IAM policies (ecs:RunTask/DescribeTasks/StopTask conditioned on cluster ARN, iam:PassRole conditioned on ecs-tasks.amazonaws.com) - Orchestrator polling: ECS compute-level crash detection alongside existing DDB polling (non-fatal, wrapped in try/catch) - AgentStack: conditional ECS infrastructure (ABCA_ENABLE_ECS env var) - Full test coverage: 15 ECS strategy tests, 9 construct tests, 5 orchestrator ECS tests. All 563 tests pass. Deployed and verified: stack deploys cleanly, CDK synth passes cdk-nag, agent task running on AgentCore path unaffected.
- Keep gitleaks/osv-scanner enabled in CI (only disable trivy/grype/semgrep) - Type ComputeStrategy.type and SessionHandle.strategyType as ComputeType - Trim/filter ECS_SUBNETS to handle whitespace and trailing commas - Handle undefined exit code in ECS pollSession (container never started) - Scope iam:PassRole to specific ECS task/execution role ARNs - Validate all-or-nothing ECS props in TaskOrchestrator constructor - Remove dead hasEcsBlueprint detection; document env-flag driven approach - Add comment noting strategy_type as additive event field
926778b to
eb0bfa9
Compare
The ECS container's default CMD starts uvicorn server:app which waits for HTTP POST to /invocations — but in standalone ECS nobody sends that request, leaving the agent idle. Override the container command to invoke entrypoint.run_task() directly with the full orchestrator payload via AGENT_PAYLOAD env var. Also add GITHUB_TOKEN_SECRET_ARN to the ECS task definition base environment.
|
Pulled locally and merged main, when looking at it: Critical Issues
cdk/src/handlers/cancel-task.ts:110-125 — The cancel handler hardcodes StopRuntimeSessionCommand for AgentCore. When a user cancels an ECS-backed task, the DDB record transitions to Fix: The cancel handler should use resolveComputeStrategy() to dispatch to the correct strategy's stopSession(), or at minimum store enough metadata (compute_type, session handle) on
cdk/src/handlers/orchestrate-task.ts:128-131 — The old startSession wrote agent_runtime_arn to the task record during the HYDRATING→RUNNING transition. The new strategy-based path omits Fix: Restore writing agent_runtime_arn for AgentCore strategy. For ECS, store compute_type and session metadata on the task record.
cdk/src/constructs/task-orchestrator.ts:180-185, 277-280 — The all-or-nothing validation checks 5 ECS props but omits ecsTaskRoleArn and ecsExecutionRoleArn. If someone provides the 5 Fix: Include role ARNs in the all-or-nothing validation, or throw when ecsClusterArn is set but role ARNs are missing. Never fall back to ['*'].
cdk/src/handlers/orchestrate-task.ts:180-185 — The catch block downgrades ALL exceptions from pollSession to logger.warn, including AccessDeniedException, TypeError, and other Fix: Track consecutive poll failures. After N consecutive failures, escalate to logger.error and fail the task. Distinguish transient errors (throttling) from persistent errors (access Important Issues
cdk/src/handlers/shared/orchestrator.ts:366-395 — The old startSession function and its AgentCore client are still exported but no longer imported by any handler. The remaining test in
cdk/src/constructs/ecs-agent-cluster.ts:75 — /ecs/abca-agent-tasks is hardcoded. Two stacks in the same account/region will collide with ResourceAlreadyExistsException. Let CDK
Both strategies immediately cast fields out of the untyped metadata bag (handle.metadata.runtimeArn as string, handle.metadata.clusterArn as string). A discriminated union keyed on export type SessionHandle =
It throws unconditionally ("not implemented"), so callers cannot safely call pollSession on an arbitrary ComputeStrategy without first checking the type. Either make pollSession
The all-or-nothing constraint is invisible at the type level. A single ecsConfig?: EcsComputeConfig with all fields required would express the constraint at the type level and eliminate
cdk/src/handlers/orchestrate-task.ts:171-179 — Only 'failed' is checked. If the ECS task exits with code 0 but the agent fails to write terminal status to DDB, the orchestrator polls
cdk/src/handlers/orchestrate-task.ts:169 — Creates a new EcsComputeStrategy() up to 1020 times inside the poll loop. Resolve once before the loop and reuse. Test Coverage Assessment: 7/10 Strengths:
Critical Gaps:
Positive Observations
|
End-to-End Verification ✅Deployed the latest Test task: Build a simple calculator web app (HTML/CSS/JS) against
What was verified:
|
|
@MichaelWalker-git could you please just fix the mutation and I'll merge it, thank you ! |
…aker (krokoko review aws-samples#6, aws-samples#8) Two related hardening changes on ``agent/src/progress_writer.py``. Grouped because the shared circuit breaker reuses the error- classification decision to know when NOT to flip itself, and separating the commits would force an awkward "half-fix" intermediate state. ## Findings addressed **aws-samples#6 — Circuit breaker trips on ValidationException (permanent errors)** The pre-fix ``except Exception`` branch fed ALL errors into the same ``_failure_count`` counter. A persistent schema/size error (e.g. ``ValidationException`` from an item >400 KB under a trace-heavy event) counted against the transient-failure budget and tripped the breaker within 3 events, silencing the entire progress stream for the rest of the task — even though most subsequent events were normal size and would have written fine. New behaviour classifies each DDB error into three buckets: - **Permanent (drop event, keep stream alive):** ``ValidationException``, ``ItemCollectionSizeLimitExceededException`` — the individual event is malformed or oversized, retrying or treating as transient would not help. Log at WARN, skip the event, do NOT increment the failure counter. - **Immediate-disable (fatal, don't even try to retry):** ``ResourceNotFoundException``, ``AccessDeniedException``, ``UnauthorizedOperation`` — wrong deploy or IAM misconfig. Disable the breaker on the first occurrence instead of waiting for 3 failures; log at WARN with the error code. Avoids spamming operator dashboards with 3 copies of the same permissions error. - **Transient (trip the breaker on repeated failures, as today):** ``ProvisionedThroughputExceededException``, ``RequestLimitExceeded``, ``ServiceUnavailable``, ``InternalServerError``, plus network-layer (``ConnectionError``, ``EndpointConnectionError``, ``ReadTimeoutError``). Same counter semantics as pre-fix. - **Unknown (default conservative):** counted as transient (counter increments) but logged at ERROR with an explicit ``UNKNOWN`` marker so operators notice and can add new codes to the permanent/transient lists. Does NOT instant-disable — over- correcting from pre-fix behaviour would swap one failure mode for another. Uses ``botocore.exceptions.ClientError`` + ``err.response["Error"]["Code"]`` for AWS errors; class-name matching for non-ClientError (network- layer) paths. Helper: ``_classify_ddb_error(exc) -> Literal["permanent", "immediate_disable", "transient", "unknown"]``. **aws-samples#8 — Dual _ProgressWriter instances with independent circuit breakers** Pre-fix, the runner-level writer (turn/tool events at ``runner.py:240``) and the pipeline-level writer (milestones at ``pipeline.py:303``) each held their own ``_failure_count`` / ``_disabled`` state. If throttling tripped one writer, the other kept writing — creating visible event gaps in the stream that operators could not distinguish from agent activity (milestones firing after turn events stop, or vice versa). Fix: consolidate circuit-breaker state into a module-level ``_SharedCircuitBreaker`` singleton keyed by ``task_id``. Both writers for the same task read/write the same ``(_failure_count, _disabled)`` pair through named methods (``is_disabled``, ``record_failure``, ``record_success``, ``disable``). One task's stream is either healthy (all events flow from both writers) or degraded (no events flow from either). Cannot have a half-alive stream. Semantics notes: - ``record_success`` resets the shared counter but NOT the ``_disabled`` flag. Re-enabling mid-task would let a flaky minute burn the failure budget repeatedly and defeat the breaker's purpose. - Empty-string ``task_id`` (``runner.py`` falls back to sentinel ``"unknown"``) collapses to shared state for all ``"unknown"`` writers. Real task_ids stay isolated. - Writers retain ``_disabled`` / ``_failure_count`` as properties that proxy to the shared map. Existing callers (``hooks.py`` does ``getattr(progress, "_disabled", False)``) and tests that assign ``writer._failure_count = 0`` keep working unchanged — no constructor signature change required, no ``runner.py`` / ``pipeline.py`` edits. - Single ``threading.Lock()`` protects the shared map; DDB write rate (single-digit events/sec) never contends meaningfully. - Test hygiene: ``_reset_circuit_breakers()`` helper rebinds the module global so autouse fixtures give each test a clean slate. ## Tests +24 regression tests net (36 → 60 in ``test_progress_writer.py``). Coverage: - Finding aws-samples#6 classification: ``test_permanent_error_does_not_trip_breaker`` (10 consecutive ``ValidationException`` writes keep ``_disabled=False``), ``test_transient_error_trips_breaker_as_before``, ``test_access_denied_disables_writer_immediately_with_loud_log``, ``test_unknown_exception_treated_as_transient_with_error_log``. - Finding aws-samples#8 sharing: ``test_shared_circuit_breaker_across_writers_same_task_id`` (writer-A trips the breaker; writer-B sees ``is_disabled`` and skips the DDB call), ``test_separate_tasks_have_independent_breakers``, ``test_unknown_sentinel_task_id_is_isolated``, ``test_reset_helper_clears_shared_state_between_tests``. - Plus edge cases: success-interleave resets the counter across writers; ``_disabled`` stays open after re-enabling a success mid-task; thread-safety via concurrent writes. Agent suite: 497 passing (was 473; +24). Refs: krokoko code review on PR aws-samples#52 (findings 6, 8)
Adds the 14 agent-side approval milestone writers (§11.1) on
``_ProgressWriter`` so Chunk 3's hook integration has a typed API
instead of stringly-typed ``write_agent_milestone`` calls, and the
per-task gate counter / per-container sliding-window rate limit /
denial-injection queue on ``PolicyEngine`` that §6.5 requires.
Why now: the hook work lands cleanly only after these surfaces exist
— every code path in ``pre_tool_use_hook``'s REQUIRE_APPROVAL branch
calls one of these helpers. Shipping them separately lets the hook
commit be about the state machine, not the event-shape bookkeeping.
Engine additions:
- ``approval_gate_count`` / ``increment_approval_gate_count``: the
per-task counter §12.9 bounds at ``approvalGateCap``. Session-scoped
in v1; persistence tracked in §17.
- ``approvals_in_last_minute`` / ``record_approval_gate_timestamp``:
sliding-window rate limit (20/min/container, §12.9). Prune on read
so callers see the current count without a separate tick.
- ``queue_denial_injection`` / ``drain_denial_injections``: queue
consumed by ``_denial_between_turns_hook`` at the next Stop seam
(§6.5). Reason is pre-sanitized upstream by ``DenyTaskFn``.
- ``mark_ceiling_shrinking_emitted``: emit-once latch for IMPL-26.
- ``APPROVAL_RATE_LIMIT`` / ``APPROVAL_RATE_WINDOW_S`` module consts
the hook imports rather than re-deriving.
Milestone writers (§11.1 table, 14 agent-emitted of 15):
- ``pre_approvals_loaded``, ``approval_requested``,
``approval_granted``, ``approval_denied``, ``approval_timed_out``,
``approval_stranded``, ``approval_write_failed``,
``approval_resume_failed``, ``approval_poll_degraded``,
``approval_timeout_capped``, ``approval_ceiling_shrinking``,
``approval_cap_exceeded``, ``approval_rate_limit_exceeded``,
``approval_late_win``.
- ``approval_decision_recorded`` (Lambda audit) and
``approval_timeout_capped_at_submit`` (CreateTaskFn) stay on the
Lambda side — Chunk 5 owns those.
Each helper is a thin wrapper over ``_put_event("agent_milestone",
...)`` so the shared circuit-breaker + classifier path (finding aws-samples#6/aws-samples#8)
continues to apply. Metadata keys mirror the §11.1 shapes verbatim
(``maxLifetime_remaining_s`` preserves the design-doc spelling for
downstream parsers).
Tests: +29 total. 17 on ``TestApprovalMilestoneHelpers`` pin the DDB
payload shape for each helper (including the two
``approval_timeout_capped`` reason variants — rule_annotation carries
matching_rule_ids; maxLifetime_ceiling omits the field). 12 on the
engine: counter monotonicity, rate-window prune semantics at window
boundary, denial-queue FIFO + drain-clears, ceiling-shrinking latch
idempotency.
No caller changes — engine and writer surfaces are additive. Hook
integration lands in commit C.
Lands the three read/discovery handlers Chunk 6 (CLI) needs to power
``bgagent pending``, ``bgagent policies list/show``, and
``bgagent notifications configure slack``. Completes §15.2 tasks
14, 15, and 25 (handler side).
Handlers
--------
``get-pending.ts`` (§7.7 — GET /v1/pending)
- Queries ``user_id-status-index`` GSI on TaskApprovalsTable with
``user_id = :caller AND status = :pending``. Without the GSI
this would be a full-table Scan per call — under
``watch -n1 bgagent pending`` that exhausts burst capacity for
the whole fleet (§10.1 finding aws-samples#8).
- Response maps each row to ``PendingApprovalSummary`` with a
derived ``expires_at = created_at + timeout_s`` so the CLI can
render time-to-timeout without doing arithmetic on ISO strings.
- Severity coerced to ``medium`` on unknown values so GSI writes
that drift from the enum don't break the list response.
- Rate-limited 10/min/user (synthetic row on the same table,
namespaced ``RATE#<user>#PENDING`` so it does not collide with
the approve/deny counter).
``get-policies.ts`` (§7.6 — GET /v1/repos/{repo_id}/policies)
- Combines ``BUILTIN_HARD_DENY_POLICIES`` + ``BUILTIN_SOFT_DENY_POLICIES``
with the repo's ``cedar_policies`` blueprint override. Runs the
combined text through ``parseRules`` and returns
``{hard[], soft[]}`` rule summaries.
- 5-minute per-repo in-Lambda cache; cold starts throw it away.
``_resetCacheForTests`` exposed for unit-test isolation.
- Repo ID is URL-decoded from the path (``owner%2Frepo`` common in
CLI UX).
- Rate-limited 30/min/user.
- Blueprint load failure falls back to built-ins with a WARN log;
invalid blueprint cedar text returns 503 ``SERVICE_UNAVAILABLE``
rather than a misleading empty list.
``link-slack-user.ts`` (§11.2 finding aws-samples#4 — POST /v1/notifications/slack/link)
- Writes to SlackUserMappingTable with
``ConditionExpression: attribute_not_exists(slack_user_id)``. This
guard is the entire admission control the §11.2 design hinges on:
even a compromised Slack admin cannot overwrite an existing
mapping.
- Validates ``slack_user_id`` shape (letters, digits, underscores,
2–40 chars) so junk rows cannot land.
- Conflict surface is 409 ``REQUEST_ALREADY_DECIDED`` — reused
error code (the payload message directs the user to unlink via
support).
- Slack link_token end-to-end validation against Slack OAuth is
deferred — v1 accepts the token on trust from the Cognito-authed
caller; it is persisted in CloudWatch for audit.
Supporting primitives
---------------------
``shared/builtin-policies.ts`` — mirrors ``agent/policies/hard_deny.cedar``
and ``agent/policies/soft_deny.cedar`` as TypeScript string constants.
Embedded rather than read from disk because Lambda's esbuild bundler
does not copy non-TS assets by default and a dedicated bundling hook
is more code than the embed. A drift test
(``builtin-policies.test.ts``) asserts byte-equality with the agent
files so any change on one side without the other flips red at build
time.
``shared/cedar-policy.ts`` — ``parseRules`` now skips the unannotated
``base_permit`` entry (both tiers need it as a Cedar catch-all; it
is not a user-facing rule so it stays out of ParsedRule[]). This
matches the agent-side ``_parse_policy_annotations`` behaviour.
Tests: +37 total.
- get-pending (8): 401 on missing auth, 429 on rate limit, empty
result, GSI query shape, row → PendingApprovalSummary with
derived expires_at, severity fallback, missing timeout → expires_at
falls back to created_at, 500 on DDB error.
- get-policies (11): 401/400 validation, built-in rules listed on
empty repo, URL-decoded repo path, custom blueprint rule lands
in soft, per-repo cache across calls, 429 rate limit, 503 on
invalid blueprint cedar, fallback on load failure, hard rules
omit severity / approval_timeout_s, soft rules carry them.
- link-slack-user (8): 401/400 validation, shape check, 201 on
success, 409 on overwrite attempt, 500 on unknown DDB error,
whitespace trim on slack_user_id, ConditionExpression verified.
- builtin-policies (4): drift byte-equality with both agent files,
parseRules round-trip for hard/soft rule IDs.
- cedar-policy (updated): ``base_permit`` is skipped from
ParsedRule[] rather than rejected.
Stack wiring (task-api.ts routes, agent.ts layer attachment,
CreateTaskFn extension, orchestrator + reconciler + fanout) lands in
the next commit.
…ws-samples#79 review aws-samples#8) Original set omitted documented Slack permission/scope failures. Codes outside the set fall to the retryable branch, so a misconfiguration like ``ekm_access_denied`` or ``missing_scope`` would burn 3 Lambda retries before DLQ on every event — even though the failure is fundamentally a configuration bug that no retry can clear. Adds: - Permission/scope: missing_scope, ekm_access_denied, team_access_not_granted, posting_to_general_channel_denied - Payload shape: invalid_arguments Reorganized the set into commented blocks (channel-shape, auth, permission/scope, payload-shape) so future additions go in the right bucket and the rationale stays visible. Test coverage: parametrized over the full TERMINAL_SLACK_API_ERRORS set (21 codes) — every one must throw SlackApiError so the router swallows it. The existing retryable test.each remains intact and covers the negative-class case (codes outside the set throw a plain Error and escalate to retry).
#79) * feat(fanout): migrate SlackNotifyFn to FanOutConsumer subscriber (#64) Move the Slack outbound delivery off its own DynamoDB Streams consumer onto FanOutConsumer as a per-channel dispatcher. Drops TaskEventsTable from 2 concurrent stream readers to 1, restoring headroom for future channels (Email, Teams, etc.) without exceeding the documented DynamoDB Streams 2-reader-per-shard practical limit. The PR also addresses an adversarial code review on the original migration; the body below walks through each piece in the order it landed. ## (a) Migration - `cdk/src/handlers/slack-notify.ts` — rewritten as exported `dispatchSlackEvent(event, ddb)` plus a tagged `SlackApiError` class. The standalone `handler(event)` stream entrypoint is gone; the FanOutConsumer is now the only TaskEventsTable stream reader. Behaviour preserved bit-for-bit: channel_source==='slack' gate, terminal-event dedup via conditional UpdateItem on `channel_metadata.slack_notified_terminal`, threaded replies under the @mention or task_created message, emoji transitions (eyes -> hourglass -> ✅/❌/🚫/⏲), DM channel_id -> user_id rewrite, intermediate session+created message cleanup on terminal events. - `cdk/src/handlers/fanout-task-events.ts` — replaces the log-only `dispatchToSlack` stub with a wrapper that calls dispatchSlackEvent and routes errors via the new typed contract (see (b) below). Slack defaults gain task_created, session_started, task_timed_out so the router fans out the lifecycle events the old SlackNotifyFn handled; the dispatcher's channel_source gate keeps non-Slack tasks unaffected. - `cdk/src/constructs/fanout-consumer.ts` — adds a scoped `secretsmanager:GetSecretValue` grant on `bgagent/slack/*` so the fanout Lambda can fetch per-workspace bot tokens. Same scope the old SlackNotifyFn role held. - `cdk/src/constructs/slack-integration.ts` — deletes SlackNotifyFn, its DynamoEventSource, its IAM policy, and its NagSuppressions entry. Drops the now-unused StartingPosition / FilterCriteria / FilterRule / lambdaEventSources imports. After this lands, `aws lambda list-event-source-mappings` shows exactly one consumer of the TaskEventsTable stream (FanOutFn); verified on the dev stack with end-to-end @mention + cancel + CLI isolation scenarios. ## (b) Review fix #1 — partial-batch retry semantics (BLOCKER) The first review pass found that the post-migration handler silently dropped Slack-side infra errors (DDB throttle on the GetItem, Secrets Manager 5xx, transient Slack timeout). Pre-migration the SlackNotifyFn handler rethrew non-SlackApiError so Lambda retried the batch; post-migration `Promise.allSettled` swallowed the rejection and routeEvent returned an empty list with no escalation path to `batchItemFailures`. routeEvent's return type changed from `NotificationChannel[]` to `{ dispatched, infraRejections }`. The handler now pushes the record into `batchItemFailures` whenever `infraRejections.length>0`, so Lambda replays the record under the partial-batch contract. The warn line on rejection is tagged `retryable: true` so operators can alert distinctly from the channel-terminal swallow path. GitHub got the symmetric treatment: 4xx (excluding the existing 401 and 404 handling) is now treated as a channel-terminal swallow via `fanout.github.api_error` instead of escalating to retry. ## (c) Review fix #2 — split SlackApiError into terminal + retryable Originally any `!result.ok` Slack response was wrapped in SlackApiError and swallowed. That collapsed retryable codes (`ratelimited`, `service_unavailable`, `internal_error`, `fatal_error`, `request_timeout`) into the same swallow as `channel_not_found` — a tier-1 Slack outage would silently drop every message. Introduced `TERMINAL_SLACK_API_ERRORS` set + `classifySlackError` helper. Terminal codes still throw SlackApiError (router swallows). Retryable codes throw a plain Error so the router classifies them as infra rejections and Lambda replays. ## (d) Review fix #3 — NOTIFIABLE_EVENTS / CHANNEL_DEFAULTS drift The original migration added task_created/session_started/task_timed_out to CHANNEL_DEFAULTS.slack but the dispatcher's NOTIFIABLE_EVENTS gate already excluded several events the router was subscribing Slack to (agent_error, pr_created, task_stranded). Result: Slack was reported as `dispatched` for events it silently dropped — telemetry lied, agent_error never reached operators on Slack-origin tasks, and task_stranded rendered the generic "Event: task_stranded for owner/repo" fallback (UX regression). Added render cases for task_stranded and agent_error in slack-blocks.ts and added them to NOTIFIABLE_EVENTS. Forward-compat approval_required and status_response stay out of NOTIFIABLE_EVENTS until their emitters ship; a new cross-file consistency test in fanout-task-events.test.ts fails if anyone re-introduces the drift. The Slack dispatcher wrapper now passes `effectiveEventType` so an agent_milestone(pr_created) wrapper is unwrapped before NOTIFIABLE_EVENTS matching. Without the rewrite, the dispatcher would short-circuit on the wrapper string `agent_milestone`. ## (e) Review fix #4 — conditional UpdateItem on lifecycle persists Once the BLOCKER fix made batches retry, the original task_created and session_started UpdateItem calls became hazardous: a Slack POST that succeeded but whose follow-up UpdateItem failed transiently would, on retry, post a second root and overwrite slack_thread_ts — orphaning every threaded reply that had threaded under the first ts. Both UpdateItems now carry an `attribute_not_exists` ConditionExpression on the relevant `channel_metadata.slack_*_msg_ts`. On ConditionalCheckFailedException the handler logs at info, deletes the duplicate Slack message via `chat.delete`, and returns. Sibling retry wins the race; the duplicate is cleaned up. ## (f) Dev-stack regression: drop pr_created from Slack defaults Live verification surfaced a UX duplication: pr_created (subscribed in CHANNEL_DEFAULTS.slack as the original §6.2 design called for) and task_completed both rendered messages with View PR buttons, posted seconds apart. The original SlackNotifyFn had silently dropped pr_created (NOTIFIABLE_EVENTS gate), so users hadn't relied on it. Removed pr_created from CHANNEL_DEFAULTS.slack and from NOTIFIABLE_EVENTS, and removed the prCreatedMessage renderer. GitHub keeps pr_created (its edit-in-place comment surface genuinely benefits from the early checkpoint). ## Verification - mise //cdk:compile — clean - mise //cdk:test — 1183 / 1183 pass (8 net-new tests added for the review fixes: NOTIFIABLE_EVENTS drift guard, retryable Slack codes, GitHub 4xx swallow, infra rejection escalation, SlackApiError swallow, task_stranded render) - mise //cdk:eslint — clean - mise //cdk:synth — confirms exactly one Lambda::EventSourceMapping on TaskEventsTable, pointing at FanOutFn - Dev-stack scenarios — @mention happy path, Cancel button, CLI submit (channel_source=api -> zero Slack dispatches, GitHub edit-in-place still fires) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(fanout): retry GitHub 403/429 instead of swallowing as terminal (#79 review #1) PR #79 review found that the new 4xx terminal-swallow path treats HTTP 403 and 429 as channel-terminal — but on GitHub these are transient rate-limit responses (403 with "API rate limit exceeded", 429 "Too Many Requests"). Under a reconciliation wave that touches many tasks, an entire window of GitHub comment updates would be permanently dropped with only a warn log. Carve out 403 and 429 from the swallow guard so they propagate as infra rejections through ``Promise.allSettled``. The record lands in ``batchItemFailures`` and Lambda replays until the rate-limit window clears (or DLQs after ``retryAttempts``). Test coverage: parametrized over 403 + 429 with a GitHubCommentError mock at the helper boundary, asserting the record's eventID surfaces in ``batchItemFailures`` rather than being absorbed. * fix(fanout): guard Slack Secrets Manager grant on a prop (#79 review #2) Every other external-service grant in FanOutConsumer (taskTable, repoTable, githubTokenSecret) is gated by ``if (props.X)``, so a deployment that hasn't onboarded the corresponding service stays free of dangling IAM permissions. The original migration broke the pattern with an unconditional ``bgagent/slack/*`` Secrets Manager grant — dev stacks without Slack onboarding ended up holding read permission on a resource pattern they never use, with a misleading ``cdk-nag AwsSolutions-IAM5`` suppression reason. Adds an optional ``slackSecretArnPattern`` prop on ``FanOutConsumerProps``; the policy statement is only attached when the prop is set. ``cdk/src/stacks/agent.ts`` now computes the ``bgagent/slack/*`` ARN inline and passes it through, mirroring the other guarded props. ``ArnFormat`` and ``Stack`` imports moved out of fanout-consumer.ts since the construct no longer needs them. No changes to live behaviour — agent.ts always passes the prop, so the IAM policy still attaches in production. The dispatcher will log-and-fail-retry on a missing pattern (covered by review #3 fix). Test gap covering the construct itself ships in a follow-up commit (test gap #34). * fix(fanout): throw on missing TASK_TABLE_NAME env var (#79 review #3) Pre-fix: when ``TASK_TABLE_NAME`` was unset on a Slack-subscribed event, ``dispatchSlackEvent`` returned silently after a warn line. The router counted Slack as ``dispatched`` and a broken stack quietly dropped every Slack notification — operators only saw it in the warn-rate metric, with no rejected-channel signal. Post-fix: throw a plain Error so the rejection propagates as an infra rejection through ``Promise.allSettled``. The router pushes the record into ``batchItemFailures``, Lambda retries the batch, the ``fanout.dispatcher.rejected`` warn fires per record, and operators get a distinct alarm. Also bumps the existing log line from ``warn`` to ``error`` and attaches an ``error_id: FANOUT_SLACK_MISSING_TASK_TABLE`` so the deployment-bug case can be distinguished from per-record failures. Test: ``throws when TASK_TABLE_NAME env var is missing`` deletes the env var, asserts the throw, asserts no DDB call was attempted (env-var guard fires first). * fix(fanout): match SlackApiError by name as well as instanceof (#79 review #7) When a bundler ever duplicates the slack-notify module (rare with NodejsFunction tree-shaking but possible if dual-bundled), two distinct SlackApiError classes coexist and ``instanceof`` against one fails for instances of the other. The dispatcher would see a foreign-class SlackApiError, fall through to the rethrow branch, and the router would treat it as an infra rejection — flipping a channel-terminal swallow into infinite Lambda retries. Add an ``err.name === 'SlackApiError'`` fallback so the swallow branch fires either way. Mirrors the duck-typed ``GitHubCommentError`` check used elsewhere in the same handler. Test: synthesise a plain Error with name === 'SlackApiError' (NOT an instance of the mock's SlackApiError class) and assert batchItemFailures stays empty — proving the swallow path catches both shapes. * fix(fanout): extend TERMINAL_SLACK_API_ERRORS with permission codes (#79 review #8) Original set omitted documented Slack permission/scope failures. Codes outside the set fall to the retryable branch, so a misconfiguration like ``ekm_access_denied`` or ``missing_scope`` would burn 3 Lambda retries before DLQ on every event — even though the failure is fundamentally a configuration bug that no retry can clear. Adds: - Permission/scope: missing_scope, ekm_access_denied, team_access_not_granted, posting_to_general_channel_denied - Payload shape: invalid_arguments Reorganized the set into commented blocks (channel-shape, auth, permission/scope, payload-shape) so future additions go in the right bucket and the rationale stays visible. Test coverage: parametrized over the full TERMINAL_SLACK_API_ERRORS set (21 codes) — every one must throw SlackApiError so the router swallows it. The existing retryable test.each remains intact and covers the negative-class case (codes outside the set throw a plain Error and escalate to retry). * fix(fanout): promote Slack reaction/delete network errors to error logs (#79 review #5) The reaction / delete helpers (``addReaction``, ``removeReaction``, ``deleteMessage``) used to log every catch at warn with a single generic event key, lumping API-level rejections (e.g. ``no_reaction``) together with infrastructure failures (DNS lookup, TLS handshake, fetch timeout, JSON parse error from a hostile gateway). Operators who alarmed on the warn rate saw a flat signal that masked genuine infra problems. Split the boundary: - API-level (``!result.ok`` after a successful HTTP call) stays at warn with channel-specific event keys (``fanout.slack.reaction_add_api_error``, ``fanout.slack.reaction_remove_api_error``, ``fanout.slack.message_delete_api_error``). These are per-message UX problems; operators don't page. - Network errors (the outer ``catch (err)`` after ``fetch``) promote to ``logger.error`` with dedicated event keys (``fanout.slack.reaction_add_network_error``, ``fanout.slack.reaction_remove_network_error``, ``fanout.slack.message_delete_network_error``) and ``error_id``s (``FANOUT_SLACK_REACTION_NETWORK``, ``FANOUT_SLACK_DELETE_NETWORK``) so each has its own alarmable signal. User-visible symptoms when these fire silently: stale emoji reactions (hourglass never swaps to ✅) and orphaned intermediate messages. Behaviour unchanged: errors are still swallowed (per-message reactions and intermediate cleanup are best-effort by design; they must not fail the batch), but operators now get distinct metrics for each failure class. * fix(fanout): emit fanout.slack.dup_delete_failed on ghost-message accumulation (#79 review #6) The conditional UpdateItem dup-delete path (``task_created`` / ``session_started`` lifecycle persists) calls ``deleteMessage`` to clean up the duplicate Slack message that landed when a sibling retry won the race. The delete is inherently best-effort — but if it fails, the duplicate becomes a permanent ghost in the thread and operators had no way to alarm on the rate. Refactor ``deleteMessage`` to return a boolean (``true`` on success or ``message_not_found``-as-already-gone, ``false`` otherwise) and emit a dedicated ``fanout.slack.dup_delete_failed`` event with an ``error_id: FANOUT_SLACK_DUP_DELETE_FAILED`` from the dup-delete callsites when the cleanup couldn't complete. The terminal-event cleanup paths (``slack_session_msg_ts``, ``slack_created_msg_ts``) intentionally don't fire this event — those paths target genuinely-stale UX cleanup, not retry-driven duplicates, so an alarm there would be noise. No new tests beyond the existing dup-delete coverage; the ``deleteMessage`` return value isn't yet asserted at the unit level, but the behavior is fully exercised by the existing ``dup-delete`` integration paths (test gap #31 will add an explicit failure-path assertion when it lands). * chore(fanout): tighten RouteOutcome arrays to ReadonlyArray (#79 review #9) ``RouteOutcome.dispatched`` and ``infraRejections`` were typed as plain ``NotificationChannel[]`` — which made ``readonly`` on the property prevent reassignment but still allow callers to mutate the underlying array via ``.push``, ``.splice``, or ``.sort``. Inconsistent with the ``ReadonlySet<string>`` used for ``CHANNEL_DEFAULTS`` in the same file. Tightening to ``ReadonlyArray<NotificationChannel>`` makes the contract honest: the router owns the arrays, callers read them. Test suite updated to use ``[...outcome.dispatched].sort()`` where it previously called ``.sort()`` directly — the explicit copy makes the intent clear and would have surfaced any silent test-side mutation. * refactor(fanout): make SlackDispatchEvent a type alias of FanOutEvent (#79 review #10) The two interfaces were structurally identical: same five fields, same readonly modifiers, same metadata shape. The decoupling was purely nominal and a silent-drift footgun — adding a field to ``FanOutEvent`` (e.g. when the router starts plumbing an ``approval_required`` ID through) would not flow into ``SlackDispatchEvent``, leaving the dispatcher unaware until a downstream test happened to fail. Replace with a one-line type alias: export type SlackDispatchEvent = FanOutEvent; The slack-notify module now type-imports ``FanOutEvent`` from fanout-task-events. ``import type`` is erased at compile time, so the runtime bundle still has the one-way dep (fanout-task-events → slack-notify) — no module-cycle hazard. Reviewer-suggested ``Pick<FanOutEvent, 'task_id' | …>`` was considered and rejected: the dispatcher uses every field of ``FanOutEvent``, so the Pick would just enumerate the same five fields with extra noise. A direct alias keeps the intent obvious and prevents drift identically. * fix(fanout): generalize Slack dedup to cover agent_error + log Retry-After (#79 review #4) PR #79 review #4 surfaced a sibling-channel-failure hazard: when GitHub or Email rate-limits, the record lands in ``batchItemFailures``. On the Lambda retry, every Slack-subscribed event for that record runs again. Terminal events were already guarded by ``slack_notified_terminal``; ``agent_error`` was not — operators would page twice on a single agent failure if a sibling channel happened to fail. Generalize the dedup mechanism. ``TERMINAL_EVENTS`` is replaced by a ``SLACK_DEDUP_ATTRIBUTE`` map that marks each event type with the ``channel_metadata`` attribute that should guard the post: - 5 terminals share ``slack_notified_terminal`` (any first-arriving terminal claims the right; subsequent terminals dedup against it) - ``agent_error`` gets its own ``slack_dispatched_agent_error`` so a duplicate agent_error doesn't reuse the terminal slot - ``task_created`` / ``session_started`` map to ``null`` because they already use the per-event ``slack_*_msg_ts`` conditional persists from review #1 — the conditional already provides full idempotency (a separate marker would be redundant) Also surfaces Slack's ``Retry-After`` header on rate-limited responses through a dedicated ``fanout.slack.retryable_api_error`` warn so operators reading CloudWatch can see the recovery window instead of guessing from sustained warn rate. Tests: - logs Retry-After header on rate-limited Slack responses (new): asserts ``retry_after_seconds`` propagates from Slack's response header into the warn metadata - existing terminal-codes parametrized test untouched (terminal branch doesn't read headers) - existing retryable test gains a ``headers: { get: () => null }`` stub on the fetch mock so the headers.get call doesn't crash Reviewer suggested a per-channel dispatch bitmap as the alternative. Rejected as premature: the duplicate-GitHub-PATCH is harmless (idempotent), Email is still a stub, and the dedup map covers the specific agent_error pain identified above. A bitmap would add a new table + IAM grants + per-dispatch DDB cost for a hypothetical problem (Slack rate-limiting AND a sibling channel failure). * test(fanout): conditional UpdateItem race + dup-delete coverage (#79 test gap) Adds 4 tests covering the lifecycle-persist conditional path that review fix #1 introduced and review fix #6 hardened. Pre-PR-#79 the only ConditionalCheckFailed coverage was the terminal-dedup path; the new lifecycle-persist + dup-delete code lacked direct assertions and was flagged 9/10 criticality by the reviewer. - task_created persist ConditionalCheckFailed → posts duplicate then deletes it: pins the cleanup behaviour that prevents ghost task_created posts in the channel - session_started persist ConditionalCheckFailed → posts duplicate then deletes it: parallel coverage for the other lifecycle attribute (slack_session_msg_ts) - dup-delete failure emits fanout.slack.dup_delete_failed with error_id: pins the operator-alarm signal added in review fix #6; asserts both the event key and the FANOUT_SLACK_DUP_DELETE_FAILED error_id propagate - chat.delete returning message_not_found is treated as success (no dup_delete_failed): negative-class assertion. Prevents false-positive alarms when the race resolves cleanly (the duplicate was already deleted by a prior retry). The ghost / message_not_found tests use ``fetchMock.mockImplementation`` URL-routing rather than ``.mockResolvedValueOnce`` chains because ``updateReaction`` issues 2-3 reaction-API fetches between chat.postMessage and chat.delete; routing by URL keeps the test focused on the load-bearing chat.delete behaviour without coupling to reaction call order. * test(fanout): cover task_stranded + agent_error renderers (#79 test gap #32) Pre-PR-#79 the new ``taskStrandedMessage`` and ``agentErrorMessage`` helpers in slack-blocks.ts had no direct unit tests. Reviewer flagged this as a 7/10 gap because the renderers carry the prior_status / error_type / message_preview metadata threaded through from the event source — silent drift in the metadata field names would produce ugly fallback messages in production. Adds 5 tests: - task_stranded WITH metadata renders the prior_status parenthetical (``Task stranded for org/repo (last status: RUNNING)``) so operators can tell at a glance whether the task hung in HYDRATING vs RUNNING — without the parenthetical the reviewer's "generic Event: ..." UX regression would resurface. - task_stranded WITHOUT metadata still renders cleanly (legacy events written before the reconciler started stamping metadata must not crash or leak ``undefined``). - agent_error with full metadata (error_type + message_preview) renders the rotating_light, type, and preview. - agent_error WITHOUT metadata stays sensible — no leaked ``undefined`` strings or empty ``_Type:_`` line. - agent_error truncates a 500-char message_preview to keep Slack channel UX readable. * test(fanout): cover agent_error dedup + dedup-slot isolation (#79 test gap #33) Pre-PR-#79 review-fix #4 there was no direct test for the ``slack_dispatched_agent_error`` dedup attribute or its interaction with the existing ``slack_notified_terminal`` slot. A future refactor that collapsed the two slots — or renamed one of them — would silently break the sibling-channel-failure-retry guarantee that fix #4 added. Adds 4 tests: - ``agent_error claims its own dedup attribute``: pins the UpdateExpression and ConditionExpression strings so a refactor that renames the attribute breaks loudly. - ``agent_error retry hits the dedup guard``: end-to-end scenario matching review #4 — task already has ``slack_dispatched_agent_error: true``, retry must short-circuit before chat.postMessage. Without the guard, a second rotating_light fires. - ``terminal dedup attribute is per-class``: a flaky task_completed-then-task_failed sequence dedups against the same ``slack_notified_terminal`` slot. Catches the regression where the orchestrator emits both terminal types and we'd otherwise post both ✅ and ❌. - ``agent_error and terminals use distinct dedup slots``: the important negative — having ``slack_dispatched_agent_error`` set must NOT shadow a subsequent ``task_completed``. Pins the slot separation so a future merge into a single slot can't silently drop terminals after an agent_error. * test(fanout): add construct-level tests for FanOutConsumer (#79 test gap #34) The construct shipped on issue #64 with no unit-level coverage of its IAM contract. The only synth-level signal lived inside ``slack-integration.test.ts`` ("0 EventSourceMapping") which proved the migration didn't regress the OTHER construct. Reviewer flagged this 6/10 — and the gap is what allowed review #2 (unconditional Slack secret grant) to slip through in the first place. Adds 6 tests: - ``attaches a single DynamoEventSource on the TaskEventsTable stream``: pins the architectural invariant — issue #64 was fundamentally about reaching exactly-one stream reader. Adding a second consumer must fail this test loudly. - ``creates a DLQ for the fanout Lambda``: pins retention period + presence; a DLQ-less deployment would silently drop poison-pill records past retryAttempts. - ``omits the bgagent/slack/* grant when slackSecretArnPattern is not provided``: the review #2 invariant. Iterates every IAM::Policy and asserts NONE of them grant secretsmanager:* on a bgagent/slack/* ARN. A regression that re-introduces the unconditional grant breaks this test. - ``attaches the bgagent/slack/* grant only when slackSecretArnPattern is provided``: the positive case. Pins the grant shape (action, effect, resource pattern). - ``passes TASK_TABLE_NAME env var when taskTable is provided``: review #3 dependency — the dispatcher throws on missing env. - ``omits TASK_TABLE_NAME env var when taskTable is not provided``: graceful degrade for dev stacks that haven't onboarded the TaskTable yet (matches the construct's documented contract). * test(fanout): cover task_stranded through terminal dedup (#79 test gap #35) The reconciler at handlers/reconcile-stranded-tasks.ts:170 emits BOTH ``task_stranded`` and ``task_failed`` for a heartbeat-expired task — one for the operator signal, one to drive the FAILED status transition. Pre-PR-#79 this pair had no test coverage; reviewer flagged this 8/10 because the visible failure mode (a paired "Task stranded" + "Task failed" double-page in Slack) would surface in production but be silent in CI. Adds 2 tests: - ``task_stranded posts and writes the terminal dedup marker on first arrival``: pins that task_stranded participates in the shared terminal slot and renders the warning message with metadata. Catches a regression that omits task_stranded from the dedup map entirely. - ``task_stranded after a sibling task_failed dedups``: the operational scenario — task_failed already claimed ``slack_notified_terminal``; the subsequent task_stranded must short-circuit before chat.postMessage. Without this guard, operators get the double-page the reviewer warned about. * fix(fanout): re-read TaskRecord before terminal cleanup to close orphan-message race Live observation during PR #79 review verification: the same Slack @mention happy path sometimes leaves the 🚀 task_created message in the thread (orphaned beside the ✅ task_completed) and sometimes deletes it cleanly. The race window: 1. ``task_created`` stream batch posts the rocket message and persists ``slack_created_msg_ts`` via the conditional UpdateItem introduced in PR #79 review fix #1. 2. ``task_completed`` stream batch fires ~30s later. Its initial GetItem races the prior UpdateItem and sees a stale ``channel_metadata`` WITHOUT ``slack_created_msg_ts``. 3. The terminal cleanup branch checks ``channelMeta.slack_created_msg_ts`` — undefined — silently skips the chat.delete. The rocket message stays in the thread. Add a fresh GetItem inside the TERMINAL_EVENTS cleanup branch, after the dedup UpdateItem has linearized our view of the table. Any prior ``slack_*_msg_ts`` writes are visible by then, so the cleanup fires correctly. On a re-read failure (DDB throttle / transient blip) we fall back to the dispatch-entry snapshot and emit ``fanout.slack.cleanup_reread_failed`` so operators can alarm on the rate. Pre-existing race (the unconditional UpdateItem in pre-PR-#79 was the same shape — wrote, GetItem on the next batch could miss it). PR #79 doesn't introduce it but doesn't fix it either; this commit does, since the live screenshot evidence appeared during review verification. Tests: - ``terminal cleanup re-reads TaskRecord``: scripts a stale dispatch-entry GetItem followed by a fresh re-read GetItem with ``slack_created_msg_ts`` present; asserts chat.delete fires against the freshly-read ts. - ``terminal cleanup falls back to dispatch-entry snapshot when re-read fails``: defense-in-depth — DDB throttle on the re-read must not break terminal delivery; cleanup uses the entry snapshot and emits the fallback warn. --------- Co-authored-by: bgagent <bgagent@noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Alain Krok <alkrok@amazon.com>
Adds the 14 agent-side approval milestone writers (§11.1) on
``_ProgressWriter`` so Chunk 3's hook integration has a typed API
instead of stringly-typed ``write_agent_milestone`` calls, and the
per-task gate counter / per-container sliding-window rate limit /
denial-injection queue on ``PolicyEngine`` that §6.5 requires.
Why now: the hook work lands cleanly only after these surfaces exist
— every code path in ``pre_tool_use_hook``'s REQUIRE_APPROVAL branch
calls one of these helpers. Shipping them separately lets the hook
commit be about the state machine, not the event-shape bookkeeping.
Engine additions:
- ``approval_gate_count`` / ``increment_approval_gate_count``: the
per-task counter §12.9 bounds at ``approvalGateCap``. Session-scoped
in v1; persistence tracked in §17.
- ``approvals_in_last_minute`` / ``record_approval_gate_timestamp``:
sliding-window rate limit (20/min/container, §12.9). Prune on read
so callers see the current count without a separate tick.
- ``queue_denial_injection`` / ``drain_denial_injections``: queue
consumed by ``_denial_between_turns_hook`` at the next Stop seam
(§6.5). Reason is pre-sanitized upstream by ``DenyTaskFn``.
- ``mark_ceiling_shrinking_emitted``: emit-once latch for IMPL-26.
- ``APPROVAL_RATE_LIMIT`` / ``APPROVAL_RATE_WINDOW_S`` module consts
the hook imports rather than re-deriving.
Milestone writers (§11.1 table, 14 agent-emitted of 15):
- ``pre_approvals_loaded``, ``approval_requested``,
``approval_granted``, ``approval_denied``, ``approval_timed_out``,
``approval_stranded``, ``approval_write_failed``,
``approval_resume_failed``, ``approval_poll_degraded``,
``approval_timeout_capped``, ``approval_ceiling_shrinking``,
``approval_cap_exceeded``, ``approval_rate_limit_exceeded``,
``approval_late_win``.
- ``approval_decision_recorded`` (Lambda audit) and
``approval_timeout_capped_at_submit`` (CreateTaskFn) stay on the
Lambda side — Chunk 5 owns those.
Each helper is a thin wrapper over ``_put_event("agent_milestone",
...)`` so the shared circuit-breaker + classifier path (finding aws-samples#6/aws-samples#8)
continues to apply. Metadata keys mirror the §11.1 shapes verbatim
(``maxLifetime_remaining_s`` preserves the design-doc spelling for
downstream parsers).
Tests: +29 total. 17 on ``TestApprovalMilestoneHelpers`` pin the DDB
payload shape for each helper (including the two
``approval_timeout_capped`` reason variants — rule_annotation carries
matching_rule_ids; maxLifetime_ceiling omits the field). 12 on the
engine: counter monotonicity, rate-window prune semantics at window
boundary, denial-queue FIFO + drain-clears, ceiling-shrinking latch
idempotency.
No caller changes — engine and writer surfaces are additive. Hook
integration lands in commit C.
Lands the three read/discovery handlers Chunk 6 (CLI) needs to power
``bgagent pending``, ``bgagent policies list/show``, and
``bgagent notifications configure slack``. Completes §15.2 tasks
14, 15, and 25 (handler side).
Handlers
--------
``get-pending.ts`` (§7.7 — GET /v1/pending)
- Queries ``user_id-status-index`` GSI on TaskApprovalsTable with
``user_id = :caller AND status = :pending``. Without the GSI
this would be a full-table Scan per call — under
``watch -n1 bgagent pending`` that exhausts burst capacity for
the whole fleet (§10.1 finding aws-samples#8).
- Response maps each row to ``PendingApprovalSummary`` with a
derived ``expires_at = created_at + timeout_s`` so the CLI can
render time-to-timeout without doing arithmetic on ISO strings.
- Severity coerced to ``medium`` on unknown values so GSI writes
that drift from the enum don't break the list response.
- Rate-limited 10/min/user (synthetic row on the same table,
namespaced ``RATE#<user>#PENDING`` so it does not collide with
the approve/deny counter).
``get-policies.ts`` (§7.6 — GET /v1/repos/{repo_id}/policies)
- Combines ``BUILTIN_HARD_DENY_POLICIES`` + ``BUILTIN_SOFT_DENY_POLICIES``
with the repo's ``cedar_policies`` blueprint override. Runs the
combined text through ``parseRules`` and returns
``{hard[], soft[]}`` rule summaries.
- 5-minute per-repo in-Lambda cache; cold starts throw it away.
``_resetCacheForTests`` exposed for unit-test isolation.
- Repo ID is URL-decoded from the path (``owner%2Frepo`` common in
CLI UX).
- Rate-limited 30/min/user.
- Blueprint load failure falls back to built-ins with a WARN log;
invalid blueprint cedar text returns 503 ``SERVICE_UNAVAILABLE``
rather than a misleading empty list.
``link-slack-user.ts`` (§11.2 finding aws-samples#4 — POST /v1/notifications/slack/link)
- Writes to SlackUserMappingTable with
``ConditionExpression: attribute_not_exists(slack_user_id)``. This
guard is the entire admission control the §11.2 design hinges on:
even a compromised Slack admin cannot overwrite an existing
mapping.
- Validates ``slack_user_id`` shape (letters, digits, underscores,
2–40 chars) so junk rows cannot land.
- Conflict surface is 409 ``REQUEST_ALREADY_DECIDED`` — reused
error code (the payload message directs the user to unlink via
support).
- Slack link_token end-to-end validation against Slack OAuth is
deferred — v1 accepts the token on trust from the Cognito-authed
caller; it is persisted in CloudWatch for audit.
Supporting primitives
---------------------
``shared/builtin-policies.ts`` — mirrors ``agent/policies/hard_deny.cedar``
and ``agent/policies/soft_deny.cedar`` as TypeScript string constants.
Embedded rather than read from disk because Lambda's esbuild bundler
does not copy non-TS assets by default and a dedicated bundling hook
is more code than the embed. A drift test
(``builtin-policies.test.ts``) asserts byte-equality with the agent
files so any change on one side without the other flips red at build
time.
``shared/cedar-policy.ts`` — ``parseRules`` now skips the unannotated
``base_permit`` entry (both tiers need it as a Cedar catch-all; it
is not a user-facing rule so it stays out of ParsedRule[]). This
matches the agent-side ``_parse_policy_annotations`` behaviour.
Tests: +37 total.
- get-pending (8): 401 on missing auth, 429 on rate limit, empty
result, GSI query shape, row → PendingApprovalSummary with
derived expires_at, severity fallback, missing timeout → expires_at
falls back to created_at, 500 on DDB error.
- get-policies (11): 401/400 validation, built-in rules listed on
empty repo, URL-decoded repo path, custom blueprint rule lands
in soft, per-repo cache across calls, 429 rate limit, 503 on
invalid blueprint cedar, fallback on load failure, hard rules
omit severity / approval_timeout_s, soft rules carry them.
- link-slack-user (8): 401/400 validation, shape check, 201 on
success, 409 on overwrite attempt, 500 on unknown DDB error,
whitespace trim on slack_user_id, ConditionExpression verified.
- builtin-policies (4): drift byte-equality with both agent files,
parseRules round-trip for hard/soft rule IDs.
- cedar-policy (updated): ``base_permit`` is skipped from
ParsedRule[] rather than rejected.
Stack wiring (task-api.ts routes, agent.ts layer attachment,
CreateTaskFn extension, orchestrator + reconciler + fanout) lands in
the next commit.
* docs(cedar-hitl): restore and revise HITL gates design, fold adversarial findings
Design doc was accidentally removed in 0742ebe; restored from b34d7cd and
substantially revised under a new filename. "Phase 3" framing dropped — this
is the Cedar HITL approval gates feature.
- Renamed PHASE3_CEDAR_HITL.md → CEDAR_HITL_GATES.md; all "phase" gating
removed (Phase 3a/3b → v1 / future work §17).
- Integrated 16 findings from 2026-05-06 adversarial review with realistic
scenarios. Major structural changes:
- Decision #23 (new): cross-engine parity contract between cedarpy (agent,
Python) and @cedar-policy/cedar-wasm@4.10.0 (Lambda, TS).
- §11.2: SlackUserMappingTable with OAuth user-initiated mapping; severity-
gated Slack approvals; admin has no write path.
- §7.1/§12.3: ApproveTaskFn uses cross-table TransactWriteItems for atomicity.
- §10.1: user_id-status-index GSI on TaskApprovalsTable; v1 not v-later.
- §15.6: cedar-wasm as a Lambda layer shared across policy Lambdas.
- Gate-cap revision (2026-05-07): decision #13 — default 50, blueprint-
configurable via security.approvalGateCap (bounded 1–500), persisted on
TaskTable. Cache memory bound decoupled: 50-entry LRU regardless of cap.
IMPL-22 adds telemetry-driven re-evaluation criteria.
- Timeout adversarial+advocate pass (2026-05-07):
- §6.5 VM-throttle race fix: re-read row on failed TIMED_OUT
ConditionCheckFailed; honor APPROVED if user beat the timer. IMPL-24.
- Sub-120s @approval_timeout_s emits blueprint-load WARN. IMPL-25.
- User-visible timeout cap milestones (approval_timeout_capped_at_submit,
approval_ceiling_shrinking). IMPL-26.
- Runtime JWT: no refresh logic in agent/src/ (container uses IAM role);
ceiling stays min(1h, maxLifetime_remaining - 120s). IMPL-27.
- Three new CloudWatch metrics for timeout tuning. IMPL-28.
- §14.8 new: off-hours trade-off section (fail-closed is the invariant).
- §13.13 new: notification-delivery failure does NOT pause the timer
(bypass-prevention).
- Added six mermaid diagrams: three-outcome decision flow, end-to-end round-
trip, TaskApprovalsTable state machine, Slack user-mapping, fail-closed
decision flow, cross-engine parity check.
- Cross-references updated in INTERACTIVE_AGENTS.md and SECURITY.md.
- Starlight mirror regenerated via docs/scripts/sync-starlight.mjs.
No code changes in this commit — design work only. Implementation lands in a
follow-up PR per §15.2 task list.
* feat(cedar-hitl): pin Cedar engines and seed cross-engine parity contract
Chunk 1 of the Cedar HITL gates PR (docs/design/CEDAR_HITL_GATES.md).
Lays the foundation before engine rewrites in Chunk 2+: both Cedar engines
pinned exactly per decision #23, annotation surface validated by Day-1
spikes per decision #22, and the golden-file parity fixtures seeded so
every subsequent chunk can rely on the contract.
- Pin cedarpy==4.8.0 (agent) and @cedar-policy/cedar-wasm@4.10.0 (cdk)
exactly (no ^/~); document both in mise.toml header.
- Add agent/tests/test_cedarpy_annotations_contract.py (10 tests)
validating all 5 annotations round-trip verbatim via
policies_to_json_str() under staticPolicies.<id>.annotations.
- Add cdk/test/handlers/shared/cedar-policy.test.ts (12 tests) validating
policySetTextToParts + policyToJson extract the same annotations
verbatim and isAuthorized returns the documented {type, response}
wrapper shape.
- Add contracts/cedar-parity/ with 5 golden-file fixtures (single-match,
multi-match, hard-deny, soft-deny write, no-match default-allow) +
README documenting the contract. Every fixture policy carries a
@rule_id - including the base permit as @rule_id("base_permit") - so
the parity tests raise if either engine returns an unannotated match
instead of silently dropping it.
- Add agent/tests/test_cedar_parity.py (6 tests, cedarpy side) and
cdk/test/handlers/shared/cedar-parity.test.ts (6 tests, cedar-wasm
side) loading the shared fixtures and asserting (decision, sorted
rule_ids) match expected. Both tests hard-import cedarpy/cedar-wasm
so a dependency regression fails loud rather than silently skipping.
- Update docs/design/CEDAR_HITL_GATES.md sections 15.2 row 3, 15.6
prose and the parity mermaid diagram to point at contracts/cedar-parity/
(the precedent set by contracts/memory-hash-vectors.json) instead of a
new tests/fixtures/ dir. Regenerate the Starlight mirror.
- Add IMPL-29 noting the cedarpy diagnostics.reasons / cedar-wasm
diagnostics.reason naming asymmetry surfaced by the spikes; engine
code normalizes at the boundary.
- Fix rev-4 -> rev-5 cosmetic footer drift.
Test counts: agent 500 -> 516 (+16), cdk 1036 -> 1054 (+18), cli 190
unchanged. No production code changes in this chunk; engine rewrite
lands in Chunk 2.
Follow-up: separate chore issue to move contracts/memory-hash-vectors.json
into a self-named subdir for consistency with contracts/cedar-parity/.
* feat(cedar-hitl): three-outcome PolicyEngine core
Chunk 2 of the Cedar HITL gates PR. Rewrites agent/src/policy.py into
the three-outcome engine specified in docs/design/CEDAR_HITL_GATES.md
section 6. The REQUIRE_APPROVAL outcome is the human-in-the-loop surface
the next chunks (PreToolUse hook extension, REST API, CLI) plug into.
This chunk ships the engine and its load-time validation; no hook or
wire-format changes yet.
Engine:
- Outcome enum (ALLOW, DENY, REQUIRE_APPROVAL) + extended PolicyDecision
with .allowed backward-compat shim for Phase 1a/1b/2 callers. Custom
__init__ accepts both outcome= and legacy allowed= kwargs so existing
tests keep working verbatim.
- Three-outcome pipeline per section 6.2: hard-deny eval (absolute) ->
allowlist fast-path (tool_type/tool_group/bash_pattern/write_path/
all_session) -> recent-decision cache (60s TTL on DENIED/TIMED_OUT) ->
soft-deny eval (with post-eval rule-scope allowlist check and
blueprint_disable filtering) -> default ALLOW.
- ApprovalAllowlist (section 6.4): parses and matches every scope type.
Strips whitespace and rejects empty-after-strip values so
"tool_type: Read " normalizes instead of silently mismatching (review
finding 6).
- RecentDecisionCache (section 12.9): 50-entry LRU, INDEPENDENT of
approvalGateCap. Populated only on DENIED/TIMED_OUT. Session-scoped
(documented section 12.8 caveat).
- Annotation handling (sections 5.2 + 6.3): parses @rule_id, @tier,
@approval_timeout_s, @severity, @category via
cedarpy.policies_to_json_str(); merges on multi-match with min timeout
(clamped by 30s floor) and max severity.
- Load-time validation (sections 5.1, 12.4): rejects missing/mismatched
@tier, missing @rule_id, sub-floor timeouts, duplicate rule_ids across
tiers, blueprint text > 64 KB, disable entries naming built-in
hard-deny rules (finding 9), approval_gate_cap outside [1, 500]
(decision 13). Sub-120s @approval_timeout_s emits WARN but accepts
(IMPL-25).
- Fail-closed posture (section 13): cedarpy parse errors surface via
diagnostics.errors -> RuntimeError raised inside _eval_tier -> outer
handler returns DENY with reason "fail-closed: <ExceptionType>".
TypeError on json.dumps of unhashable tool_input surfaces as distinct
"fail-closed: unhashable_tool_input" reason (review finding 5).
Built-in policies:
- agent/policies/hard_deny.cedar: base_permit catch-all + rm_slash +
write_git_internals + write_git_internals_nested + drop_table +
pr_review-specific Write/Edit forbids (absolute).
- agent/policies/soft_deny.cedar: base_permit (catch-all required in
each tier so cedarpy default-deny does not convert no-match into
DENY) + force_push_any + force_push_main + push_to_protected_branch
+ write_env_files + write_credentials. All soft rules carry @tier,
@rule_id, @approval_timeout_s, @severity, @category per section 15.4
starter set.
Review findings addressed (1 blocker, 8 significant, plus minor):
- blueprint_disable actually disables soft rules at eval time instead
of silently no-op (the blocker: test coverage had been a silent-pass).
- Legacy extra_policies with @tier/@rule_id rejected to avoid undefined
double-annotation behavior.
- _matching_rule_ids logs WARN on unknown policy IDs (state-drift
signal).
- base_permit validator exemption restricted to effect=="permit" so
misnamed forbid rules cannot bypass validation (finding 7).
- Hard-tier Cedar no_decision logged at WARN (signals missing/malformed
base_permit catch-all).
- Allowlist whitespace normalization + empty-value rejection.
- StrEnum upgrade, Callable moved to TYPE_CHECKING, assert replaced
with explicit RuntimeError for S101 compliance.
Phase 1 compatibility:
- All 39 existing test_policy.py tests pass unchanged via the
.allowed property. One test (test_invalid_policy_syntax_fails_closed)
updated to patch _hard_policies instead of the removed _policies
attribute; docstring explains the rewrite.
- extra_policies kwarg preserved; callers with annotated rules must
migrate to blueprint_soft_policies / blueprint_hard_policies.
Test counts: agent 516 -> 576 (+60: 51 three-outcome + 9 regression
fixes). cli 190 unchanged. cdk 1054 unchanged.
Carry-forward to Chunk 3:
- extra_policies semantic shift (Phase 1 DENY -> Chunk 2
REQUIRE_APPROVAL); .allowed=False preserved but .outcome differs.
Switchover happens when hooks.py adopts the three-outcome branching.
- Cross-tier action-context asymmetry (review finding 8): document
rule-authoring constraint in section 5.5 of design.
- Probe entity-shape coverage (finding 10): extend _probe_cedar to
exercise Write/Edit/Bash action paths, not just invoke_tool.
* feat(cedar-hitl): approval milestone writers + engine counters
Adds the 14 agent-side approval milestone writers (§11.1) on
``_ProgressWriter`` so Chunk 3's hook integration has a typed API
instead of stringly-typed ``write_agent_milestone`` calls, and the
per-task gate counter / per-container sliding-window rate limit /
denial-injection queue on ``PolicyEngine`` that §6.5 requires.
Why now: the hook work lands cleanly only after these surfaces exist
— every code path in ``pre_tool_use_hook``'s REQUIRE_APPROVAL branch
calls one of these helpers. Shipping them separately lets the hook
commit be about the state machine, not the event-shape bookkeeping.
Engine additions:
- ``approval_gate_count`` / ``increment_approval_gate_count``: the
per-task counter §12.9 bounds at ``approvalGateCap``. Session-scoped
in v1; persistence tracked in §17.
- ``approvals_in_last_minute`` / ``record_approval_gate_timestamp``:
sliding-window rate limit (20/min/container, §12.9). Prune on read
so callers see the current count without a separate tick.
- ``queue_denial_injection`` / ``drain_denial_injections``: queue
consumed by ``_denial_between_turns_hook`` at the next Stop seam
(§6.5). Reason is pre-sanitized upstream by ``DenyTaskFn``.
- ``mark_ceiling_shrinking_emitted``: emit-once latch for IMPL-26.
- ``APPROVAL_RATE_LIMIT`` / ``APPROVAL_RATE_WINDOW_S`` module consts
the hook imports rather than re-deriving.
Milestone writers (§11.1 table, 14 agent-emitted of 15):
- ``pre_approvals_loaded``, ``approval_requested``,
``approval_granted``, ``approval_denied``, ``approval_timed_out``,
``approval_stranded``, ``approval_write_failed``,
``approval_resume_failed``, ``approval_poll_degraded``,
``approval_timeout_capped``, ``approval_ceiling_shrinking``,
``approval_cap_exceeded``, ``approval_rate_limit_exceeded``,
``approval_late_win``.
- ``approval_decision_recorded`` (Lambda audit) and
``approval_timeout_capped_at_submit`` (CreateTaskFn) stay on the
Lambda side — Chunk 5 owns those.
Each helper is a thin wrapper over ``_put_event("agent_milestone",
...)`` so the shared circuit-breaker + classifier path (finding #6/#8)
continues to apply. Metadata keys mirror the §11.1 shapes verbatim
(``maxLifetime_remaining_s`` preserves the design-doc spelling for
downstream parsers).
Tests: +29 total. 17 on ``TestApprovalMilestoneHelpers`` pin the DDB
payload shape for each helper (including the two
``approval_timeout_capped`` reason variants — rule_annotation carries
matching_rule_ids; maxLifetime_ceiling omits the field). 12 on the
engine: counter monotonicity, rate-window prune semantics at window
boundary, denial-queue FIFO + drain-clears, ceiling-shrinking latch
idempotency.
No caller changes — engine and writer surfaces are additive. Hook
integration lands in commit C.
* feat(cedar-hitl): TaskApprovals + AWAITING_APPROVAL transition primitives
Adds the four agent-side DDB primitives §6.5 + IMPL-24 need for the
three-outcome hook integration in the next commit:
- ``transact_write_approval_request`` — cross-table TransactWriteItems:
Put(TaskApprovalsTable) with ``attribute_not_exists(request_id)`` +
Update(TaskTable) gated on ``status = RUNNING``. Atomic per §12.3 so
a concurrent cancel cannot land the task in AWAITING_APPROVAL with
no matching approval row (or vice versa).
- ``transact_resume_from_approval`` — Update(TaskTable) gated on
``status = AWAITING_APPROVAL AND awaiting_approval_request_id =
:rid``. The ``request_id`` condition prevents resuming with a stale
ID after a reconciler race (§13.9).
- ``best_effort_update_approval_status`` — conditional UpdateItem on
the approval row with ``status = :pending`` guard. Returns False on
``ConditionalCheckFailedException``; this is the signal IMPL-24's
re-read path fires on (§6.5 pseudocode lines 846-879, §13.12).
- ``get_approval_row`` — GetItem with ``ConsistentRead=True`` by
default. Required by IMPL-24's re-read; kept opt-out (bool flag) for
future cold-path callers that don't need the strong read.
Errors:
- ``ApprovalTablesUnavailable`` for env-var-missing — raised loud so
a pre-Chunk-4 deploy fails closed (hook will map to DENY) rather
than silently no-op'ing the gate.
- ``ApprovalWriteError`` / ``ApprovalResumeError`` wrap
``TransactionCanceledException`` with the cancellation reasons
list. The hook uses these to distinguish the "concurrent cancel"
branch from real DDB outages.
- ``ConditionalCheckFailedException`` on ``update_item`` is consumed
and returned as ``False`` from ``best_effort_update_approval_status``
— the caller (hook) needs the boolean to decide whether to
re-read, not to propagate.
- All other DDB errors propagate so the hook's outer try/except can
classify fail-closed with a specific reason.
Implementation notes:
- Uses ``boto3.client("dynamodb")`` low-level API (not resource).
``transact_write_items`` lives on the client, and marshalling the
approval row attributes explicitly gives deterministic DDB shapes
that the tests can assert on. ``_py_to_ddb_attr`` covers the
subset of Python types §10.1 actually uses (str/int/bool/None/list
of str); any other type raises TypeError loudly rather than
silently writing something unexpected.
- ``_extract_error_code`` / ``_extract_cancellation_reasons`` duck-type
on ``exc.response`` so we don't need botocore at import time (tests
use a minimal exception class).
- Errors from unsupported types (floats, dicts, etc.) are caught
BEFORE the DDB round-trip so the unit-test asserts
``transact_write_items`` was not called — catches schema drift
early.
- Status constants (``_STATUS_RUNNING`` / ``_STATUS_AWAITING_APPROVAL``)
named so a rename in CDK cannot silently diverge the Python path.
Tests: +20 total.
- 5 on TransactWriteApprovalRequest: env-missing, happy-path shape
assertion (both items + conditions), TransactionCanceled → ApprovalWriteError
with reasons preserved, other errors propagate, unsupported type rejected
before any DDB call.
- 3 on TransactResumeFromApproval: env-missing, happy-path expression
shape (includes REMOVE awaiting_approval_request_id), cancel →
ApprovalResumeError.
- 4 on BestEffortUpdateApprovalStatus: happy path returns True,
``reason`` kwarg attaches ``deny_reason``, ConditionalCheckFailed
returns False (IMPL-24's signal), other errors propagate.
- 4 on GetApprovalRow: ConsistentRead default True, opt-out False,
row-not-found returns None, row unmarshalling through every
supported DDB attribute type.
- 4 on helpers: error-code extraction with and without
ClientError-shape, cancellation-reasons extraction with and without.
No runtime callers yet — hook integration lands in commit C. Physical
TaskApprovalsTable lands in Chunk 4; Python side is wire-compatible so
the hook work can be unit-tested today with mocked clients.
* feat(cedar-hitl): PreToolUse three-outcome REQUIRE_APPROVAL path
Wires the agent to the full §6.5 pseudocode: cap + rate-limit check,
atomic TransactWriteItems for pending row + TaskTable AWAITING_APPROVAL,
2s→5s ConsistentRead poll, IMPL-24 VM-throttle race re-read, resume
transition, scope propagation to allowlist, and denial-injection queue
consumed at the next Stop seam. Completes §15.2 rows 26 + 27.
Hook control flow (three outcomes)
----------------------------------
- ALLOW / DENY: existing Phase 1 behavior, now switching on
``.outcome`` rather than ``.allowed``. Legacy Phase 1/2 tests still
green because PolicyDecision preserves the ``.allowed`` shim.
- REQUIRE_APPROVAL (new): extracted into ``_handle_require_approval``
for readability. Delegates to ``task_state`` primitives and
``engine.*`` counter surfaces from the prior commits; no new DDB
client construction here.
Key pieces:
- ``_compute_effective_timeout`` applies the §6.5 min(rule, default,
lifetime) formula. The engine's ``_merge_annotations`` has already
clipped decision.timeout_s against the task default; the hook adds
the remaining-lifetime ceiling and floors at FLOOR_30S.
``clip_reason`` distinguishes ``rule_annotation`` (rule was tighter
than task default) from ``maxLifetime_ceiling`` (task is late in
its life) so ``approval_timeout_capped`` carries the right reason.
- ``_remaining_maxlifetime_s`` reads ``AGENTCORE_MAX_LIFETIME_S`` +
``TASK_STARTED_AT`` env vars (8h default). Returns ``None`` when
the start timestamp is absent — the hook treats that as "unknown,
don't clip" rather than pre-DENYing, so Phase 1 test paths that
don't set the env var still see the old task-default behaviour.
Chunk 4/5 will wire these at task launch.
- ``_poll_for_decision`` uses 2s cadence for the first 30s then 5s
(IMPL-12). All polls use ``ConsistentRead=True`` per IMPL-24. 3
consecutive GetItem failures emit ``approval_poll_degraded``; 10
consecutive failures fall through as TIMED_OUT with a specific
reason (§13.2).
- ``_reconcile_late_decision`` implements IMPL-24 re-read: on a
ConditionCheckFailed from the TIMED_OUT write, re-read with
ConsistentRead. APPROVED → rebuild outcome, propagate scope to
allowlist, run normal allow flow, emit ``approval_late_win``.
DENIED → honor the user's sanitized reason. PENDING or row gone
→ fall through with TIMED_OUT (fail-closed, §13.12 last paragraph).
Cancel-wins semantics (finding #2)
----------------------------------
``_denial_between_turns_hook`` is registered AFTER
``_nudge_between_turns_hook`` in ``between_turns_hooks`` so cancel
short-circuits both. The hook re-checks ``_cancel_requested`` itself
as belt-and-braces (matching the nudge hook) so a future reorder does
not silently break cancel-wins. Denial queue is PRESERVED on cancel —
not drained — so a denial still sitting on the queue when the task is
being torn down does not leak across tasks (the engine is per-task
per §IMPL-7).
``stop_hook`` threads ``engine`` into ``ctx`` so the denial hook can
``drain_denial_injections``. ``build_hook_matchers`` accepts a new
``user_id`` kwarg (§12.2) so approval rows carry caller identity for
the REST side's ownership check.
``permissionDecisionReason`` guaranteed surface
-----------------------------------------------
The hook's deny return is the ONLY guaranteed surface the SDK emits
to the agent; denial injection is best-effort (pre-empted by cancel).
``_deny_response`` pipes every reason through ``_strip_ansi`` +
``_truncate(500)``: ANSI sequences can never reach the model, and the
line stays loggable. §12.7 requirement.
Tests: +24 agent hook tests (47 total in test_hooks.py). Run in 0.92s
via a ``_fast_poll`` fixture that collapses ``asyncio.sleep`` to a
no-op AND advances ``hooks.time.monotonic`` by the requested duration
so the poll wall-clock deadline actually trips.
Happy paths:
- APPROVED + scope propagation to allowlist + milestones.
- APPROVED with scope=this_call does NOT grow allowlist.
- DENIED queues denial injection + populates recent-decision cache
(next identical call auto-denies).
- TIMED_OUT writes TIMED_OUT row and emits approval_timed_out.
IMPL-24 race: four branches.
- APPROVED re-read → allow flow, approval_late_win milestone, scope
propagated, resume succeeds.
- DENIED re-read → deny flow, approval_late_win milestone, user's
reason is the permissionDecisionReason.
- Still-PENDING re-read → fail-closed fall-through (no late_win).
- Row-gone re-read → same fail-closed fall-through.
Cap / rate-limit / write failure / resume failure branches all:
- Short-circuit before any DDB write when the local guard fires
(cap, rate limit).
- Emit the right approval_* milestone.
- Return DENY with a specific permissionDecisionReason.
Sanitization:
- ANSI stripped from deny reason.
- Deny reason truncated to ≤500 chars.
Timeout clipping:
- rule_annotation reason when a rule's approval_timeout_s is below
the task default; matching_rule_ids populated.
- maxLifetime_ceiling reason when remaining lifetime is the tightest
bound; matching_rule_ids is None.
- approval_ceiling_shrinking emits exactly once per task (IMPL-26
latch).
Denial injection hook (6 tests):
- Draining produces a <user_denial request_id=... decided_at=...>
block with XML-escaped reason.
- Cancel short-circuit preserves the queue so the denial is not
lost; just not injected into a dying agent.
- Hostile reason (</user_denial>...<user_nudge>) is XML-escaped so
the envelope cannot be forged.
- No-engine ctx returns [] (Phase 1 call sites still work).
- Registered LAST in ``between_turns_hooks`` (invariant for §6.5
finding #2).
- End-to-end via stop_hook: queued denial becomes
``decision=block`` + reason on the Stop return.
Carry-forward
-------------
- ``_remaining_maxlifetime_s`` returns None when TASK_STARTED_AT is
unset — Chunk 4/5 will wire this at task launch. Tracked in §16.
- ``approval_gate_count`` lives on the engine (session-scoped) not on
TaskTable in v1. §13.6 notes that the reconciler + approval_gate_cap
still bound worst-case across container restarts. Chunk 7+ tracks
persistence when telemetry justifies it.
- Denial injection emits a ``user_denial_injected`` milestone that is
NOT in the §11.1 enumerated table. It mirrors ``nudge_acknowledged``
for stream visibility; keep the name distinct from the ``approval_*``
prefix so future §11.1 consumers can't confuse it with an approval
outcome.
* feat(cedar-hitl): TaskApprovalsTable + SlackUserMapping + status enum
Lands the stateless CDK primitives for Cedar-HITL approval gates so
Chunk 5's REST handlers can be wired onto concrete tables. Completes
§15.2 tasks 9, 20, and 25.
Constructs
----------
``TaskApprovalsTable`` (§10.1)
- PK ``task_id`` + SK ``request_id`` (ULID). Matches the agent-side
primitives landed in the prior commit.
- GSI ``user_id-status-index`` with user_id PK + status SK and an
``INCLUDE`` projection limited to the fields GET /v1/pending
renders. Three deny-sensitive attrs (``deny_reason``, ``scope``,
``tool_input_sha256``) deliberately omitted from the projection —
the list endpoint only returns PENDING rows in practice, but
excluding them kills the projection-leak concern outright and
costs no bytes today.
- Exports ``USER_STATUS_INDEX_NAME`` as a module constant + mirrors
it on ``construct.userStatusIndexName`` so handlers referencing
the GSI fail compile-time on a rename.
- TTL attribute ``ttl`` (agent writes ``created_at + timeout_s +
120s``).
- No DynamoDB streams per §11.2. TaskEventsTable carries the audit
fan-out; streams here would duplicate.
- Default RemovalPolicy.DESTROY to match the rest of the sample.
Production deploys override to RETAIN per §10.1.
``SlackUserMappingTable`` (§11.2, finding #4)
- Single-key (``slack_user_id`` PK). No SK, no TTL, no GSI, no
stream. The forward-only shape is the trust boundary — a reverse
GSI (Cognito → Slack) would let a compromised Cognito sub
enumerate Slack identities without adding v1 capability.
- Writes land through LinkSlackUserFn (Chunk 5) which enforces the
``attribute_not_exists(slack_user_id)`` condition so a prior
legitimate mapping cannot be overwritten by a later compromise.
``task-status.ts`` — AWAITING_APPROVAL (§10.3)
- Added to TaskStatus enum + ACTIVE_STATUSES (NOT TERMINAL_STATUSES:
the task is alive, paused on a human decision).
- VALID_TRANSITIONS wires the five edges §10.3 enumerates:
RUNNING → AWAITING_APPROVAL (soft-deny entry)
HYDRATING → AWAITING_APPROVAL (rare early-gate case)
AWAITING_APPROVAL → RUNNING (approve / deny resume)
AWAITING_APPROVAL → CANCELLED (user cancel mid-approval)
AWAITING_APPROVAL → FAILED (stranded-approval reconciler)
- Notably NOT added:
AWAITING_APPROVAL → FINALIZING (approve-during-cleanup race)
AWAITING_APPROVAL → COMPLETED (skip RUNNING)
AWAITING_APPROVAL → TIMED_OUT (timer lives on the approval
row, not the task clock)
These are regression tests so a future refactor cannot quietly
add them and bypass the `awaiting_approval_request_id = :rid`
invariant.
Tests: +29 total.
- TaskApprovalsTable (11 tests): PK/SK schema, PAY_PER_REQUEST,
PITR default + override, TTL attribute, NO streams, GSI schema +
projection + sensitive-attr exclusion, removal policy default +
override, ``USER_STATUS_INDEX_NAME`` constant parity with the
construct field.
- SlackUserMappingTable (8 tests): single-key schema (explicit
KeySchema length assertion), PAY_PER_REQUEST, PITR, no streams,
no reverse GSI, DESTROY default, TTL absent.
- TaskStatus (+10 tests over existing: 5 new assertions on the
9-state cardinality, AWAITING_APPROVAL membership, and the
transition graph including the three forbidden edges). The
existing assertions updated for the new state count.
No stack wiring yet — ``agent.ts`` instantiation + env var plumbing +
grants land in the next commit alongside the Cedar-WASM Lambda layer.
* feat(cedar-hitl): Cedar-wasm layer + wire approval tables into agent stack
Activates the agent-side approval path and ships the Lambda layer
Chunk 5's REST handlers need.
Cedar-wasm Lambda layer (§15.2 task 10)
----------------------------------------
``CedarWasmLayer`` bundles ``@cedar-policy/cedar-wasm@4.10.0`` into
``/opt/nodejs/node_modules/`` so Lambdas can
``require('@cedar-policy/cedar-wasm/nodejs')`` without shipping the
4 MB wasm binary in every function package. A dedicated
``cdk/layers/cedar-wasm/`` directory carries a minimal ``package.json``
pinning the exact version — bundling runs ``npm install --omit=dev``
against that manifest, so the layer build is hermetic from any
``cdk/node_modules/`` drift.
The bundler has two fallbacks:
- Docker (``public.ecr.aws/sam/build-nodejs22.x``) for CI / prod
deploys.
- Local-npm fallback for environments without Docker (unit-test
synths + `cdk synth` on runners that lack Docker). The local
path is safe here because the layer ships pure JS + a prebuilt
wasm binary — no native build step.
Three constants exposed from the module:
- ``CEDAR_WASM_VERSION`` — single source of truth for the pinned
version; tests assert this matches both ``cdk/package.json`` and
the layer manifest, so the three places the version lives stay
in sync.
- ``CEDAR_WASM_MIN_LAMBDA_MEMORY_MB`` — 512 MB floor for attaching
Lambdas per §15.2 task 10.
- ``CedarWasmLayer.layer`` — the underlying ``LayerVersion`` for
Chunk 5 handlers to attach via ``fn.addLayers(...)``.
Agent stack wiring (§15.2 task 19)
------------------------------------
``agent.ts`` now instantiates:
- ``TaskApprovalsTable`` (prior commit) — grants RW to the runtime
so ``pre_tool_use_hook`` can TransactWriteItems + ConsistentRead
the PENDING row.
- ``SlackUserMappingTable`` (prior commit) — not granted to the
runtime; only the link-user Lambda (Chunk 5) writes here.
- ``CedarWasmLayer`` — the layer's asset lands in the synthed
template so Chunk 5 handlers can reference ``.layer`` without
causing a new asset on their deploy.
New runtime env vars:
- ``TASK_APPROVALS_TABLE_NAME`` — consumed by
``task_state._require_tables``; its absence previously raised
``ApprovalTablesUnavailable`` → hook DENY. Now set, so the
approval path is live on deploy.
- ``AGENTCORE_MAX_LIFETIME_S = '28800'`` — 8 hours, matching
``lifecycleConfiguration.maxLifetime``. Consumed by the hook's
``_remaining_maxlifetime_s`` for the maxLifetime ceiling clip
(§6.5). Kept in sync with the lifecycle via a direct test
assertion so drift surfaces at build time.
New CfnOutputs: ``TaskApprovalsTableName``, ``SlackUserMappingTableName``,
``CedarWasmLayerArn``. Each is useful for post-deploy smoke tests
(`aws dynamodb describe-table` / `aws lambda get-layer-version`).
Tests: +8 layer tests + 9 agent-stack assertions.
Layer:
- LayerVersion resource count.
- Compatible runtimes (nodejs20/22).
- Description carries the pinned version.
- CEDAR_WASM_VERSION matches ``cdk/package.json``.
- CEDAR_WASM_VERSION matches ``layers/cedar-wasm/package.json``.
- CEDAR_WASM_MIN_LAMBDA_MEMORY_MB ≥ 512.
- Custom description override works.
- ``.layer`` exposes a real ``LayerVersion``.
Agent stack:
- Table count updated from 6 → 8.
- TaskApprovalsTable schema match (task_id PK / request_id SK,
user_id-status-index GSI presence).
- SlackUserMappingTable single-key schema.
- LayerVersion count + compatibleRuntimes.
- Three new CfnOutputs present.
- TASK_APPROVALS_TABLE_NAME env var on the runtime.
- AGENTCORE_MAX_LIFETIME_S == '28800' (drift guard).
Carry-forward
-------------
- ``TASK_STARTED_AT`` is the other input the hook's
``_remaining_maxlifetime_s`` consumes — it's a PER-TASK value the
orchestrator must stamp at invocation time, not a stack-level env
var. Chunk 5's orchestrator changes need to add it to the runtime
invocation payload / session env. For now the hook's fallback
("unknown, don't clip") keeps approvals functional.
- Chunk 5 will attach the CedarWasmLayer onto ApproveTaskFn,
DenyTaskFn, GetPoliciesFn, CreateTaskFn and assert
``memorySize >= CEDAR_WASM_MIN_LAMBDA_MEMORY_MB`` for each.
* feat(cedar-hitl): approve + deny handlers + shared types (§7.1, §7.2)
Lands the two user-facing REST handlers that flip a PENDING approval
row to APPROVED / DENIED, the shared types both call sites and the
CLI consume, and the Lambda-side Cedar parser future Chunk-5 handlers
(get-policies, create-task validation) will use.
Wire types (shared/types.ts)
----------------------------
- ApprovalScope union covering every shape the agent's
ApprovalAllowlist understands. Typed so approve-task / create-task /
CLI (Chunk 6) all agree at compile time.
- ApprovalRecord / ApprovalRequest / ApprovalResponse / DenyRequest /
DenyResponse / PendingApprovalSummary / GetPendingResponse /
PolicyRuleSummary / GetPoliciesResponse / LinkSlackUserRequest /
LinkSlackUserResponse / SlackUserMappingRecord /
ApprovalDecisionRecordedEvent / CreateTaskApprovalExtensions.
- Constants: DENY_REASON_MAX_LENGTH=2000, INITIAL_APPROVALS_MAX_ENTRIES=20,
INITIAL_APPROVALS_MAX_ENTRY_LENGTH=128, APPROVAL_TIMEOUT_S_MIN=30,
APPROVAL_TIMEOUT_S_MAX=3600, APPROVAL_TIMEOUT_S_DEFAULT=300.
New error codes: REQUEST_NOT_FOUND, REQUEST_ALREADY_DECIDED,
TASK_NOT_AWAITING_APPROVAL.
Shared helpers
--------------
- shared/approval-scope.ts — parseApprovalScope validates every shape;
rejects unknown tool types / groups / prefixes, empty values,
over-128-char strings. isDegeneratePattern implements §7.4 (length
≤ 2, all-wildcard, wildcard ratio > 50%) for Chunk-5 create-task.
- shared/deny-reason-scanner.ts — scanDenyReason redacts AWS keys,
GitHub PATs (classic + fine-grained), Slack tokens, PEM blocks,
bearer tokens with [REDACTED-...] markers. Mirrors
agent/src/output_scanner.py so the deny reason the agent
ultimately reads is never raw user input.
- shared/cedar-policy.ts — parseRules pulls the five HITL annotations
(tier/rule_id/severity/approval_timeout_s/category) into a
ParsedRule[], preserving positional policy_id for IMPL-29
diagnostics-to-rule_id resolution. isHardDenyRule, isValidRuleId,
matchingRuleIds, concatPolicies exposed for future handlers.
Handlers
--------
- approve-task.ts (§7.1) — POST /v1/tasks/{task_id}/approve
- Cross-table TransactWriteItems: approval row PENDING → APPROVED
guarded by user_id = :caller AND status = :pending; TaskTable
no-op Update guarded by status = AWAITING_APPROVAL AND
awaiting_approval_request_id = :rid.
- TransactionCanceledException classified by per-item
CancellationReasons. Approval-row failure collapses to 404
REQUEST_NOT_FOUND (no existence oracle per §7.1 finding #6);
task-row failure → 409 TASK_NOT_AWAITING_APPROVAL.
- Optional scope defaults to this_call.
- Per-user per-minute rate limit (30/min, synthetic row).
- Writes approval_decision_recorded audit event (IMPL-6). Audit
failure is logged but does not fail the request — decision is
already committed.
- deny-task.ts (§7.2) — POST /v1/tasks/{task_id}/deny
- Same cross-table pattern; status → DENIED + deny_reason.
- Reason is scanDenyReason-sanitized + truncated to
DENY_REASON_MAX_LENGTH BEFORE any persistence — agent and audit
both read sanitized form; raw input never stored.
- Same rate-limit namespace as approve.
Tests: +64 total (cedar-policy-parser 24, approval-scope 28,
deny-reason-scanner 13, approve-task 14, deny-task 9). Secret test
fixtures are assembled from string fragments so the source never
holds a contiguous secret literal — Code Defender pre-commit hook
otherwise blocks.
Stack wiring (task-api.ts routes, agent.ts layer attachment,
CreateTaskFn extension, orchestrator + reconciler + fanout +
LinkSlackUserFn + GetPolicies + GetPending) lands in the next
commit.
* feat(cedar-hitl): get-pending + get-policies + link-slack-user handlers
Lands the three read/discovery handlers Chunk 6 (CLI) needs to power
``bgagent pending``, ``bgagent policies list/show``, and
``bgagent notifications configure slack``. Completes §15.2 tasks
14, 15, and 25 (handler side).
Handlers
--------
``get-pending.ts`` (§7.7 — GET /v1/pending)
- Queries ``user_id-status-index`` GSI on TaskApprovalsTable with
``user_id = :caller AND status = :pending``. Without the GSI
this would be a full-table Scan per call — under
``watch -n1 bgagent pending`` that exhausts burst capacity for
the whole fleet (§10.1 finding #8).
- Response maps each row to ``PendingApprovalSummary`` with a
derived ``expires_at = created_at + timeout_s`` so the CLI can
render time-to-timeout without doing arithmetic on ISO strings.
- Severity coerced to ``medium`` on unknown values so GSI writes
that drift from the enum don't break the list response.
- Rate-limited 10/min/user (synthetic row on the same table,
namespaced ``RATE#<user>#PENDING`` so it does not collide with
the approve/deny counter).
``get-policies.ts`` (§7.6 — GET /v1/repos/{repo_id}/policies)
- Combines ``BUILTIN_HARD_DENY_POLICIES`` + ``BUILTIN_SOFT_DENY_POLICIES``
with the repo's ``cedar_policies`` blueprint override. Runs the
combined text through ``parseRules`` and returns
``{hard[], soft[]}`` rule summaries.
- 5-minute per-repo in-Lambda cache; cold starts throw it away.
``_resetCacheForTests`` exposed for unit-test isolation.
- Repo ID is URL-decoded from the path (``owner%2Frepo`` common in
CLI UX).
- Rate-limited 30/min/user.
- Blueprint load failure falls back to built-ins with a WARN log;
invalid blueprint cedar text returns 503 ``SERVICE_UNAVAILABLE``
rather than a misleading empty list.
``link-slack-user.ts`` (§11.2 finding #4 — POST /v1/notifications/slack/link)
- Writes to SlackUserMappingTable with
``ConditionExpression: attribute_not_exists(slack_user_id)``. This
guard is the entire admission control the §11.2 design hinges on:
even a compromised Slack admin cannot overwrite an existing
mapping.
- Validates ``slack_user_id`` shape (letters, digits, underscores,
2–40 chars) so junk rows cannot land.
- Conflict surface is 409 ``REQUEST_ALREADY_DECIDED`` — reused
error code (the payload message directs the user to unlink via
support).
- Slack link_token end-to-end validation against Slack OAuth is
deferred — v1 accepts the token on trust from the Cognito-authed
caller; it is persisted in CloudWatch for audit.
Supporting primitives
---------------------
``shared/builtin-policies.ts`` — mirrors ``agent/policies/hard_deny.cedar``
and ``agent/policies/soft_deny.cedar`` as TypeScript string constants.
Embedded rather than read from disk because Lambda's esbuild bundler
does not copy non-TS assets by default and a dedicated bundling hook
is more code than the embed. A drift test
(``builtin-policies.test.ts``) asserts byte-equality with the agent
files so any change on one side without the other flips red at build
time.
``shared/cedar-policy.ts`` — ``parseRules`` now skips the unannotated
``base_permit`` entry (both tiers need it as a Cedar catch-all; it
is not a user-facing rule so it stays out of ParsedRule[]). This
matches the agent-side ``_parse_policy_annotations`` behaviour.
Tests: +37 total.
- get-pending (8): 401 on missing auth, 429 on rate limit, empty
result, GSI query shape, row → PendingApprovalSummary with
derived expires_at, severity fallback, missing timeout → expires_at
falls back to created_at, 500 on DDB error.
- get-policies (11): 401/400 validation, built-in rules listed on
empty repo, URL-decoded repo path, custom blueprint rule lands
in soft, per-repo cache across calls, 429 rate limit, 503 on
invalid blueprint cedar, fallback on load failure, hard rules
omit severity / approval_timeout_s, soft rules carry them.
- link-slack-user (8): 401/400 validation, shape check, 201 on
success, 409 on overwrite attempt, 500 on unknown DDB error,
whitespace trim on slack_user_id, ConditionExpression verified.
- builtin-policies (4): drift byte-equality with both agent files,
parseRules round-trip for hard/soft rule IDs.
- cedar-policy (updated): ``base_permit`` is skipped from
ParsedRule[] rather than rejected.
Stack wiring (task-api.ts routes, agent.ts layer attachment,
CreateTaskFn extension, orchestrator + reconciler + fanout) lands in
the next commit.
* feat(cedar-hitl): wire Chunk 5 routes + orchestrator + reconciler + agent plumbing
Completes Chunk 5 end-to-end: the five new Lambdas are instantiated
and wired onto the REST API, the orchestrator threads approval-related
data through to the agent runtime, the stranded-task reconciler sweeps
AWAITING_APPROVAL tasks, and the agent pipeline accepts the new
per-task approval configuration.
Stack wiring (agent.ts + task-api.ts)
-------------------------------------
- TaskApi construct accepts `taskApprovalsTable`, `slackUserMappingTable`,
`cedarWasmLayer` props. Approve/Deny/GetPending Lambdas are created
when the approvals table is present; GetPolicies also requires the
cedar-wasm layer + RepoTable. Slack-link Lambda attaches when the
slack mapping table is provided.
- New routes:
POST /tasks/{task_id}/approve
POST /tasks/{task_id}/deny
GET /pending
GET /repos/{repo_id}/policies
POST /notifications/slack/link
- GetPoliciesFn configures `memorySize: 512` (Cedar-wasm floor from
§15.2 task 10) and externalizes `@cedar-policy/cedar-wasm` from the
esbuild bundle so the layer provides the wasm binary at runtime.
- CedarWasmLayer compatibleRuntimes extended to include nodejs24.x
(the Lambda runtime) — the Node 20/22 list was the original §15.2
spec but the actual function uses Node 24.
- agent.ts passes all three new constructs into TaskApi.
Orchestrator (shared/orchestrator.ts)
-------------------------------------
- `finalizeTask` now treats AWAITING_APPROVAL as a "task still alive"
terminal-timeout source: on poll exhaustion the task transitions to
TIMED_OUT with a distinct `approval_poll_timeout` reason + error
message ("Orchestrator poll timeout exceeded while awaiting approval").
The stranded-approval reconciler is the secondary safety net (§13.6)
for tasks the orchestrator already lost track of.
- Invocation payload now carries three new fields:
- `task_started_at` (ISO 8601 at HYDRATING → RUNNING time) —
consumed by the agent hook's `_remaining_maxlifetime_s` so the
§6.5 maxLifetime ceiling math uses the real task clock instead
of the fail-open fallback.
- `approval_timeout_s` (when the submit payload supplied it).
- `initial_approvals` (when the submit payload supplied entries).
Stranded-task reconciler
------------------------
- Sweeps AWAITING_APPROVAL in addition to SUBMITTED/HYDRATING.
- New `APPROVAL_STRANDED_TIMEOUT_SECONDS` env var (default 7200s =
2h) — double §7.3's 1h ceiling so this reconciler never races the
happy-path timer.
- Distinct failure message on approval-stranded vs generic-stranded
so users see "approval stranded — container evicted" rather than
the misleading "no pipeline attached" copy.
Fanout (handlers/fanout-task-events.ts)
---------------------------------------
- Slack channel default set replaces the forward-compat
`approval_required` stub with the real §11.1 events:
`approval_requested` and `approval_stranded`. Other approval
milestones (granted/denied/timed_out/late_win/etc.) stay out of
default routing to avoid notification fatigue — the CLI surfaces
those confirmations directly.
- Email default replaces `approval_required` with `approval_requested`
(high-severity gates only; severity gating happens in the dispatcher).
Create-task validation (shared/create-task-core.ts)
---------------------------------------------------
- New request fields:
- `approval_timeout_s` — integer within
`[APPROVAL_TIMEOUT_S_MIN, APPROVAL_TIMEOUT_S_MAX]`.
- `initial_approvals` — array of scope strings; each entry must
be a valid `ApprovalScope` per `parseApprovalScope`; bash_pattern
and write_path scopes get the §7.4 degenerate-pattern check.
- TaskRecord extended with `approval_timeout_s`, `initial_approvals`,
`approval_gate_count` (seeded to 0 at admission), and
`awaiting_approval_request_id` (written atomically by the agent's
`transact_write_approval_request` primitive).
Agent plumbing (models.py / config.py / pipeline.py / runner.py / server.py)
----------------------------------------------------------------------------
- `TaskConfig` adds `approval_timeout_s`, `initial_approvals`.
- `build_config`, `run_task`, `_run_task_background`, and
`_extract_invocation_params` thread the two new fields from payload
→ config → PolicyEngine.
- `server._extract_invocation_params` stamps `os.environ["TASK_STARTED_AT"]`
from the payload so the hook's `_remaining_maxlifetime_s` returns
real values (carry-forward from Chunk 3 resolved).
- `runner.py` constructs PolicyEngine with `initial_approvals` +
`task_default_timeout_s` when supplied; the engine clamps bad
values at construction time.
Tests
-----
All CDK tests pass: 1219 / 1219.
All agent tests pass: 648 / 648.
Affected suites (changes only):
- test/stacks/agent.test.ts: cedar-wasm layer CompatibleRuntimes
now expects `nodejs24.x`; table count still 8.
- test/constructs/cedar-wasm-layer.test.ts: same runtime expansion.
- test/handlers/fanout-task-events.test.ts: approval_required →
approval_requested/approval_stranded in Slack default set;
approval_required → approval_requested in Email default set.
- test/handlers/reconcile-stranded-tasks.test.ts: primeResponses
now queue a third `Items: []` for AWAITING_APPROVAL queries;
queryCalls assertion bumped to 3.
Carry-forward (non-blocking)
----------------------------
- GetPoliciesFn has write access to TaskApprovalsTable (for the
rate-limit counter path). A future permissions audit should
tighten this to a single-item write scoped to `RATE#<user>#*`.
- TASK_STARTED_AT env var is only set when a payload supplies it;
server.py still supports the Phase 2 no-payload startup path.
* feat(cedar-hitl): Chunk 6 CLI — approve / deny / pending / policies
Ships the four user-facing commands that close the Cedar HITL loop:
once Chunks 1-5 have a PENDING approval row and the Slack/Email fan-out
has notified the user, Chunk 6 is how they actually respond.
New commands (cli/src/commands/)
--------------------------------
- `bgagent approve <task-id> <request-id> [--scope <scope>] [--yes]`
Default scope is `this_call`; callers extend allowlist with
`tool_type:Bash`, `rule:<id>`, etc. `all_session` is the only scope
that requires `--yes` to confirm — mirrors the safety UX from
§8.4. Error classification maps 404 → "run `bgagent pending`", 409
→ "task no longer awaiting approval", 429 → rate-limit, 401 → login.
- `bgagent deny <task-id> <request-id> [--reason ... | --reason-file ...]`
`--reason-file` accepts multi-line reasons that would otherwise
need shell quoting. Client-side `DENY_REASON_MAX_LENGTH` cap avoids
a round-trip on obviously-too-long reasons; the server still
truncates. Reason is sanitized server-side (output_scanner) before
ever reaching the agent.
- `bgagent pending [--output text|json]`
Lists every PENDING approval owned by the caller. Rendered with
approve/deny hints inline so the user can copy-paste the next
command. JSON output for scripting. Rate-limited server-side.
- `bgagent policies list --repo <owner/repo> [--tier hard|soft]`
`bgagent policies show --repo <owner/repo> --rule <rule_id>`
Discovery commands so users can find rule IDs without reading CDK
source. Both subcommands reuse a single `listPolicies` API call
and filter locally.
Wire changes
------------
- `cli/src/api-client.ts`: `approveTask`, `denyTask`, `listPending`,
`listPolicies` — each matching the §7.1 / §7.2 / §7.6 / §7.7
request/response shapes. `approveTask` omits the `scope` body field
when unset so the server's `this_call` default applies.
- `cli/src/types.ts`: mirrors the Chunk 5 server types verbatim —
`ApprovalScope` union, `ApprovalRequest/Response`, `DenyRequest/Response`,
`PendingApprovalSummary`, `GetPoliciesResponse`, `PolicyRuleSummary`,
plus the five constants (`DENY_REASON_MAX_LENGTH`,
`INITIAL_APPROVALS_MAX_ENTRIES`, `INITIAL_APPROVALS_MAX_ENTRY_LENGTH`,
`APPROVAL_TIMEOUT_S_MIN/MAX/DEFAULT`).
- `cli/src/bin/bgagent.ts`: registers the four new commands in the
order they appear in help output.
Tests: +27 new (217 total).
- approve (9): default scope, custom scope, all_session guard +
`--yes` bypass, JSON output, 404/409/401/429 error classifications.
- deny (6): no-reason path, `--reason`, `--reason-file` with
tmpdir fixture, mutually-exclusive rejection, over-length rejection,
404 classification.
- pending (5): empty render, populated render with approve/deny
hints, JSON output, 401 and 429 classifications.
- policies (7): list both tiers, `--tier` filter, `--output json`,
bad `--tier`, show found rule, show unknown rule, 404
repo-not-onboarded classification.
Carry-forward
-------------
- `submit.ts` extension with `--approval-timeout` / `--pre-approve`
flags is deferred to a follow-up commit — the server already accepts
these fields on POST /v1/tasks (Chunk 5), and `bgagent submit`
already forwards unknown payload fields through the existing
request path, so users can set them via `--body-file` today until
the explicit flags land.
- `--output json` on error branches currently returns a CliError
instead of a JSON error envelope; matches the pattern the existing
commands use (status, cancel, nudge). Follow-up to standardize
JSON error envelopes across the whole CLI if that becomes a
common scripting pain point.
* feat(cedar-hitl): Chunk 7a — persist gate counter + IMPL-23 cache observability
Persist approval_gate_count to TaskTable across container restarts per
§13.6 so the cumulative gate budget survives eviction. Emit
pre_approvals_loaded after PolicyEngine init per §4 step 7 / §11.1 so
operators see the starting approval posture in the live SSE stream.
Add IMPL-23 cache-hit observability: cache hits attach metadata to
PolicyDecision, hook forwards to new write_policy_decision_cached
progress helper (decision_source="recent_decision_cache").
Why: container restarts were silently resetting the per-task gate
counter, re-exposing users to another approvalGateCap-worth of gates
per restart. Cache-driven denies were invisible in TaskEventsTable
beyond the initial gate. Fresh tasks emitted no "starting posture"
signal so dashboards could not distinguish "no pre-approvals seeded"
from "agent has not started".
Surface additions:
- task_state.increment_approval_gate_count_in_ddb — best-effort
atomic ADD on approval_gate_count
- PolicyEngine(initial_approval_gate_count=N) — seed session counter
- TaskConfig.initial_approval_gate_count — orchestrator payload field
- progress_writer.write_policy_decision_cached — IMPL-23 emitter
- PolicyDecision.cache_hit_metadata — observability-only field
- _CachedDecision.original_decision_ts — wall-clock preservation
- runner._initialize_policy_engine_and_hooks — extracted helper
Counter survival is a safety bound, not correctness: DDB failure
does NOT block the gate (§13.6). Joint-update invariant on status
+ awaiting_approval_request_id (§10.2) is preserved — counter uses
separate UpdateItem, not merged into resume transaction.
Tests: +36 agent (648→684), +8 CDK (1219→1227), +6 new runner tests.
* feat(cedar-hitl): Chunk 7b — persist approval_gate_cap from blueprint
Capture the per-task approval-gate cap at submit-time (§4 step 5,
decision #13, §13.6) so a blueprint-configured override is frozen
onto the TaskRecord. Mid-task blueprint edits cannot shift the cap
beneath a running task; container restarts re-seed the agent's
PolicyEngine from the persisted value instead of its compile-time
default-50.
Why: Chunk 7a added approval_gate_count persistence but the cap
itself was still resolved from the blueprint on every restart —
so an operator lowering security.approvalGateCap mid-task would
retroactively fail-close the running task. The design has always
said cap is frozen at submit; this chunk makes the implementation
match.
Surface additions:
- BlueprintProps.security.approvalGateCap (CDK, synth-validated
[1, 500] integer) — new per-repo blueprint prop
- RepoConfig.approval_gate_cap + BlueprintConfig.approval_gate_cap
- TaskRecord.approval_gate_cap + APPROVAL_GATE_CAP_{MIN,MAX,DEFAULT}
- create-task-core now calls loadRepoConfig, resolves cap, bounds-
checks, persists; returns 503 SERVICE_UNAVAILABLE on invalid
blueprint data (permanent until admin re-deploys, not transient)
- orchestrator.ts: isValidApprovalGateCap integer+bounds guard;
logs warn if a persisted cap is structurally invalid (schema
drift / hand-edited DDB row)
- TaskConfig.approval_gate_cap: int | None = None (agent-side);
runner threads to PolicyEngine kwarg when not None
- "Task created" log line now carries approval_gate_cap +
approval_gate_cap_source ("blueprint" | "platform_default") so
operators can detect a broken-plumbing deploy at the single
chokepoint where all fallback layers converge
Per silent-failure review:
- HIGH: 500 → 503 + logger.error for permanent misconfig
- HIGH: cap + source in task-created log (catches 4-layer cascade)
- MEDIUM: orchestrator guard tightened past typeof (NaN, Infinity,
floats, out-of-bounds all omitted + warned)
Tests: CDK 1263/1263 (+36), agent 694/694 (+10). CLI unchanged.
* feat(cedar-hitl): Chunk 7c — observability wrap-up for resolved cap + warn path
Close three deferred items from Chunks 7a/7b before Chunks 8-10:
- runner.py init log now carries approval_gate_cap=N +
approval_gate_cap_source=threaded|engine_default. Matches the
handler log key so CloudWatch Insights can join across the
cascade; agent can't distinguish blueprint-override from
platform-default-frozen (handler log is the ground truth).
- server.py adds _warn_cw helper routing [server/warn] lines to
a dedicated CloudWatch stream (server_warn/<task_id>). stdout
print is preserved for local dev + existing capsys tests.
AgentCore does not forward container stdout to APPLICATION_LOGS,
so pre-7c warnings about malformed invocation payloads were
invisible in production. Failure counter shared with _debug_cw
for a single alarm surface; hoisted above writer defs for
import-time ordering safety.
- blueprint.ts emits a synth-time info annotation when
security.approvalGateCap is omitted so operators see a signal
that the repo will rely on the platform default of 50. Without
this, the default was a silent fallback at the handler layer —
only visible by inspecting a TaskRecord at runtime.
Tests: agent 694→700 (+6), cdk 1263→1265 (+2), cli unchanged.
Design refs: §4 step 5, §11.1, §13.6, decision #13.
* feat(cedar-hitl): Chunk 8a — extend approval outcome event schema
Add created_at / effective_timeout_s / matching_rule_ids to
approval_granted / approval_denied / approval_timed_out events so
the incoming ApprovalMetricsPublisher Lambda (Chunk 8b) can compute
decision latency and emit a rule_id-dimensioned timeout breakdown
without a round-trip GetItem against TaskEventsTable.
Fields are added conditionally — omitted from metadata when the
caller did not supply them — so the event stream stays free of
null-value noise and legacy callers continue to produce valid
payloads. Publisher handles missing fields via explicit skip-and-log
on the specific metric branch (not fallback-to-zero).
Agent tests extended: +6 progress_writer tests, +3 hooks tests.
Baseline 700 → 710. No consumer wired yet — this commit is a
forward-compatible superset; Chunk 8b ships the CDK publisher +
dashboard widgets.
* feat(cedar-hitl): Chunk 8b — ApprovalMetricsPublisher + native CloudWatch dashboard widgets
Ship the Cedar-HITL dashboard widgets from §11.3 / IMPL-28 via the
MetricsPublisher architecture (Option E):
- New ApprovalMetricsPublisher Lambda consumes TaskEventsTable DDB
stream as consumer #2 (FanoutConsumer is #1; stream is within its
2-consumer soft cap — documented in task-events-table.ts).
- Handler emits CloudWatch EMF for 3 metrics in namespace
ABCA/Cedar-HITL:
* ApprovalRequestCount + ClippedApprovalCount (reason dim) →
ApprovalTimeoutClipRate widget (MathExpression with IF-guard
against NaN on zero-denominator periods)
* TimedOutEffectiveTimeout (rule_id dim with allowlist
cardinality cap) → ApprovalTimeoutBreakdown widget
* ApprovalDecisionLatencyMs (outcome dim) → ApprovalDecisionLatency
widget with per-outcome p50/p90/p99
- Observability-of-observability (silent-failure review):
* MetricsPublisherHeartbeat per batch so dashboard gaps
distinguish "no traffic" from "pipeline broken"
* MetricEmitSkipped with a reason dim on schema mismatches,
parse anomalies, unknown rule ids — never fall back to
latency=0 or count=0 which would poison percentile widgets
* Expected high-volume skip reasons (non-milestone events,
REMOVE records) DO NOT emit MetricEmitSkipped — only
anomaly reasons (missing keys, missing milestone name) do,
so real signal isn't drowned
* Structured log lines alongside every skip so the absence of
metrics is also observable via CloudWatch Logs Insights
- Cardinality caps via ``RULE_ID_ALLOWLIST`` + ``normalizeClipReason``.
Unknown values collapse to ``other`` / ``unknown`` buckets with
dashboard series so the collapse is discoverable rather than
silently accruing custom-metric cost.
- Event-source-mapping filter pattern rejects non-agent_milestone
records at the service layer; handler-layer allowlist catches
anything that slips through. Filter pattern correctness tested
structurally + positively/negatively probed (silent-failure H3).
- Per-record try/catch + reportBatchItemFailures + SQS DLQ mirror
the fanout-task-events.ts poison-pill pattern exactly.
Deferred to Chunk 10 chore issues:
- DLQ alarms (fanout + publisher) — fire-into-void until
notification channel lands, so wire with §11.5 alarms as a group
- Explicit log-group declaration (IAM drift defense)
- stdout-flush race documentation (pre-existing pattern in fanout)
- EMF 100-updates/sec throttle alarm
Tests: cdk 1265 → 1327 (+62); agent 710 (unchanged); cli 217
(unchanged). All pass. §11.5 alarm plumbing now unblocked —
publisher provides the metrics infrastructure the design always
intended; only the notification-channel SNS wiring is left.
* docs(cedar-hitl): Chunk 9 — sync design doc to Chunks 7b / 8a / 8b
Bring CEDAR_HITL_GATES.md current with the code that shipped in
Chunks 7b (approval_gate_cap persist), 8a (outcome event schema
superset), and 8b (ApprovalMetricsPublisher + dashboard widgets):
- §10.2 adds the missing approval_gate_cap row (carry-forward
drift from Chunk 7b). Bounds + frozen-at-submit semantics
documented.
- §11.1 outcome events (approval_granted / approval_denied /
approval_timed_out) now document the Chunk 8a optional fields
(created_at, effective_timeout_s, matching_rule_ids) plus the
publisher's skip-on-missing-field policy.
- §11.1 intro names ApprovalMetricsPublisherFn as consumer #2 and
points to §11.3 for the metric schema.
- §11.3 rewritten to describe the Option E architecture:
publisher Lambda + EMF + native CloudWatch metrics in namespace
ABCA/Cedar-HITL, MathExpression with divide-by-zero guard,
rule_id cardinality cap, observability-of-observability via
heartbeat + skip meta-metrics, widget layout (12/12 over 24),
2-consumer stream budget. Dropped the stale "Retired the old
bundled widget" line — that widget never shipped.
- §11.5 reframed as "deferred (notification-channel gated)" with
a plumbing-status paragraph noting the metric infra now exists;
only SNS wiring remains. Alarm list expanded to include DLQ
and publisher-health alarms.
- §16 IMPL-28 rewritten for Option E; §15.2 row 46 expanded to
reference the 4 new test files; Appendix B checklist updated.
Starlight mirror regenerated via ``cd docs && node
scripts/sync-starlight.mjs``.
No code changes. Test baselines unchanged. Adversarial
comment-analyzer review verified every new claim against
committed code — zero inaccuracies.
* feat(cedar-hitl): Chunk 10 review fixes — close 2 blockers + tighten 2 mediums
Full-branch adversarial review (code-reviewer + silent-failure-hunter
on all 18 commits) surfaced findings that only appear at final-state.
Addressing the blockers + low-cost meds before deploy:
B2 — stranded approvals were invisible to the dashboard:
- Reconciler writes ``event_type: 'task_stranded'``; the metrics
publisher's event-source filter only accepts
``event_type: 'agent_milestone'``, so AWAITING_APPROVAL evictions
produced zero §11.3 signal.
- Fix: reconciler now additionally emits an ``agent_milestone``
with ``milestone: 'approval_stranded'`` when the stranded task
was AWAITING_APPROVAL. Publisher allowlist extended; classifier
emits ``ApprovalStrandedCount`` counter. SUBMITTED / HYDRATING
stranded events unchanged (guarded by test).
B1 — heartbeat comment was false reassurance:
- Event-source filter blocks Lambda invocation when no
``agent_milestone`` records exist in the poll window, so a
quiet period produces the same widget gap as a broken
pipeline. The code + design-doc wording claimed "gap =
pipeline broken" which would mislead the on-call.
- Fix: corrected module + function docstrings to describe the
heartbeat as "present when active, not pipeline-alive-always."
Operators should alarm on the combination
(heartbeat-absent + recent TaskEventsTable traffic) or wire
a scheduled canary — the latter tracked as a §11.5 follow-up.
M1 — safety-critical milestones produced zero dashboard signal:
- ``approval_cap_exceeded`` (§12.9 per-task cap) and
``approval_rate_limit_exceeded`` (per-user per-minute rate)
were emitted by the agent but not on the publisher allowlist.
A production bug where every gate hit the cap would have
been invisible.
- Fix: both added to APPROVAL_METRIC_MILESTONES with
``ApprovalCapExceededCount`` / ``ApprovalRateLimitExceededCount``
counters. No dimensions — the request_id in the event carries
per-user correlation for ad-hoc log-insights investigation.
H2 — filter / handler eventName disagreement:
- Event-source filter required ``INSERT``; handler accepted
``INSERT`` and ``MODIFY``. Benign today (TaskEventsTable is
put-only), but a future chunk MODIFY-ing records would be
silently dropped by the filter while the handler was ready
to process them.
- Fix: handler now INSERT-only, matching the filter. Single
source of truth on the eventName invariant.
M1-rename — ``expected_non_approval_milestone`` skip reason was
misleading (the non-metric approval milestones like
``approval_late_win`` also land in this bucket). Renamed to
``expected_milestone_not_tracked``.
Tests: cdk 1327 → 1332 (+5: 3 classifier branches for new metrics,
1 reconciler AWAITING_APPROVAL path, 1 SUBMITTED-not-double-counted
guard). Agent + CLI unchanged. All pre-commit hooks green; pre-push
security fails only on the 3 pre-existing CVEs tracked for chore
issue filing.
Deferred findings from the same review (file as chore issues):
- H1: agent-dies-between-TIMED_OUT-and-resume loses latency
(edge, affects p99 bias)
- H3: late-win APPROVED created_at staleness invariant
(works today, document invariant)
- H4: _warn_cw daemon-thread burst under adversarial payload
- M2-M4: late-win metric, rename helpers, etc.
No upstream PR filing this chunk — deploy to Sam's AWS account
for integration testing first.
* fix(cedar-hitl): suppress AwsSolutions-IAM5 on Runtime ExecutionRole overflow policies
Synth + deploy were blocked by cdk-nag: the Cedar HITL additions
(TaskApprovalsTable grant + SlackUserMappingTable + extra env vars
threaded to the AgentCore runtime) pushed the runtime ExecutionRole
past CDK's inline-policy size limit, so CDK auto-splits excess
statements into ``OverflowPolicy1``. The overflow inherits the same
wildcard ``bedrock:InvokeModel*`` / CloudWatch actions as the base
policy but lives at a path
(``Runtime/ExecutionRole/OverflowPolicy1/Resource``) that the
existing ``addResourceSuppressions(runtime, ..., applyToChildren:
true)`` cannot reach — CDK creates overflow policies lazily during
synth ``prepare()``, after the construct tree has been frozen and
after static suppressions have been cached.
Suppress via an Aspect at MUTATING priority so the suppression is
applied before cdk-nag's READONLY visitor runs. Matches any path
containing ``/Runtime/ExecutionRole/OverflowPolicy`` + ending
``/Resource`` so future ``OverflowPolicy2``, etc. are covered
without hardcoding indices.
Verified: ``mise //cdk:synth`` now completes cleanly.
``mise //cdk:test`` still 1332/1332.
* fix(cedar-hitl): E2E deploy-readiness — policies bundle + onboarding gate + CLI error visibility
Three E2E T1.4 + T2.2 findings from the Chunk 10 integration-test
session. Batched into one commit since all three need the same
redeploy to verify:
1. agent/Dockerfile: COPY policies/ into the container image.
``PolicyEngine.__init__`` reads
``/app/policies/hard_deny.cedar`` + ``soft_deny.cedar`` at import
time via ``_POLICIES_DIR = Path(__file__).parent.parent /
"policies"``. The Dockerfile only copied ``src/``, so the
directory was missing and every Cedar-HITL task failed at 0 turns
with ``missing built-in hard-deny policies``. Introduced
alongside Chunk 2 when the policy files were first added —
Dockerfile was never updated. Zero tasks on this branch ever
su…
…ompt, cap, parity (#248) #5 (High) — post-hook load_workflow had no fail-soft fallback, so a reload failure AFTER run_agent mutated/committed the tree stranded the work as FAILED with no PR. read_only now comes from config (already fallback-computed and the verdict that drove Cedar); the reload — needed only for the ensure_pr strategy — is wrapped in the same WorkflowValidationError fallback build_config uses, defaulting to "create" so the PR still opens. #8 — knowledge/web-research-v1 had no registered prompt and silently degraded to the generic default-agent prompt. Add a research-specialized, repo-less prompt (prompts/web_research.py) and register it. #9 — the 5 MiB artifact cap was checked AFTER encoding to UTF-8 (a second full copy). Reject on the character count up front (chars ≤ bytes in UTF-8) so the cap actually bounds memory on the MicroVM before materializing the bytes. #10 residual — add a parity test for the agent-side REPO_LESS_DEFAULT_WORKFLOW_ID / DEFAULT_WORKFLOW_ID constants: the repo-less default must match the CDK platform default and be requires_repo:false in the YAML; the coding default must be repo-bound. Closes the last hand-maintained-copy axis #10 named. Perf — memoize load_workflow (@cache): files are image-baked and immutable per process and Workflow is frozen, so the 3-4 loads per task now parse once. Leaves the coding-lane decorative-`gate` item (run_workflow only drives run_agent while verify_build/ensure_pr stay inline) as a tracked follow-up — it is a larger half-migrated-runner refactor both reviewers flagged as non-blocking.
…rkflows (aws-samples#248) (aws-samples#296) * Initial design docs * docs(workflows): address principal-review findings on ADR-014 - Validator parity: declare JSON Schema as the single canonical shape contract (consumed, not re-implemented), keep exactly one cross-field validator impl in Phases 1-3, and add a contracts/workflow-validation/ golden corpus to lock any Phase-4 second impl to parity (mirrors the cedar-parity mechanism). - Cedar principal migration: split into its own isolated PR (Phase 2a) reviewed alone with regenerated parity fixtures, landing ahead of the pr_iteration/pr_review workflow migrations (Phase 2b) that depend on it. - Repo-optionality: reframe web_research as a not-yet-runnable schema fixture (not an acceptance proof) and require the two blocking open questions (memory actorId, artifact delivery) to be resolved as a recorded ADR addendum before the Phase-0 schema is frozen. Regenerated Starlight mirrors. * feat(agent): workflow models + loader (aws-samples#248 Phase 1) Add the agent/src/workflow/ package: Pydantic models mirroring workflow.schema.json and a loader that resolves a first-party workflow id, parses YAML, and shape-validates against the canonical JSON Schema. Additive only — does not yet touch task_type (the breaking cutover lands later in Phase 1). - models.py: Workflow + sub-models; resolved_requires_repo applies the domain-derived default (coding=true, knowledge/hybrid=false). - loader.py: JSON-Schema shape validation (one canonical contract, consumed not re-implemented), YAML load, id/path agreement, and path-traversal defense. Cross-field rules deliberately live in the separate validator (Task #2), so there is one cross-field impl. - Promote pyyaml + jsonschema to direct deps (were only transitive); the loader must not depend on another package's transitive pin. - Fix a latent schema bug found by the new tests: the requires_repo/ read_only allOf conditionals fired vacuously when the property was absent, wrongly applying repo-less constraints to a coding workflow relying on the domain default. Guard each `if` with `required`. 17 new tests; agent ruff + ty + full suite (836) green. * feat(agent): cross-field workflow validator + golden corpus (aws-samples#248 Phase 1) The single CI-time implementation of the WORKFLOWS.md cross-field rules the JSON Schema cannot express (rules 1-9, 11, 12, 14). The runtime loader stays shape-only and trusts this verdict, so there is exactly one cross-field implementation in Phases 1-3 (avoids the cedar-style two-language drift). contracts/workflow-validation/ ships the golden parity corpus from Phase 1 so the expected-verdict contract is fixed before aws-samples#246 adds a second validator, mirroring contracts/cedar-parity/. test_workflow_tree_valid.py gates every first-party workflow file through the validator at CI time. * feat(agent): workflow step runner + handler registry (aws-samples#248 Phase 1) The agent-side step runner that interprets workflow.steps inside the container (ADR-014): a StepKind→handler registry, in-order execution honouring each step's on_failure policy (fail / continue / skip_remaining), per-step step:<name>:start/<status> progress milestones, and resume-aware checkpointing (deterministic side-effect-free steps recorded to workflow_state.json on the persistent mount are skipped on resume; agentic/side-effecting steps re-run idempotently per WORKFLOWS.md §Step execution semantics). Handlers are thin wrappers over the existing helpers (setup_repo, run_agent, verify_build/lint, ensure_pr) so this is maximal structural change, minimal logic change. post_review/deliver_artifact are registered but raise NotImplementedError (Phase 2b / Phase 3) — fail-loud, not silent no-op, keeping validator rule-8 handler parity honest. Orchestration core is unit-tested with fakes; wiring into pipeline.run_task is task 5. * feat(agent): first-party workflow files — coding/new-task-v1 + default/agent-v1 (aws-samples#248 Phase 1) Migrates the new_task task type to coding/new-task-v1.yaml (the heavyweight clone→build→open-PR coding path) and ships the platform default workflow default/agent-v1.yaml — the conservative last rung of the resolution ladder used when no workflow_ref and no Blueprint default apply (requires_repo:false, read-leaning tool set, deliver-as-comment). pr_iteration/pr_review migrations are Phase 2b (depend on the Cedar principal migration). Both pass the cross-field validator and the test_workflow_tree_valid CI gate (file path agrees with declared id; zero violations). * fix(agent): address code-review findings in workflow validator + runner (aws-samples#248 Phase 1) Fixes the group-A bugs surfaced by the high-recall code review of the Phase 1 commits: runner.py - Resume-skip product loss: clone_repo/hydrate_context populate in-memory products (ctx.setup, ctx.user_prompt) that can't be rebuilt from the JSON checkpoint, so skipping them on resume left ctx.setup None (breaking ensure_pr) and ctx.user_prompt empty (unguided agent). Narrow _RESUMABLE_SKIP_KINDS to verify_build/verify_lint — steps whose ENTIRE product is the checkpointed boolean re-applied via ctx.artifacts. clone_repo/hydrate_context now re-run on resume (handler-level idempotency), matching WORKFLOWS.md's intent. - Skipped steps now emit step:<name>:start / :skipped milestones so a watcher sees them accounted for instead of a gap. validator.py - _HANDLER_KINDS (rule 8) is now derived from runner.STEP_HANDLERS — the single source of truth — instead of a hand-copied list that could silently drift. - rule-6 tier ceiling now applies the elevated-only-fields check to read-only (the strictest tier) as well as standard; previously read-only could declare mcp_servers/plugins/skills unchecked. - rule-11 now verifies a deliver_artifact step's target actually produces the declared comment/artifact outcome, not merely that the step kind is present. Tests + corpus updated; rule3/rule7 fixtures switched to s3_and_comment target to stay isolated to their intended rule, plus a new rule11 target-mismatch fixture. Full agent suite green (909 passed), ruff + ty clean. * feat(agent): wire workflow runner into pipeline behind a gate (aws-samples#248 Phase 1, task 5) Builds the runner→pipeline seam and gates it off (default production behavior unchanged). pipeline.run_task's agent invocation now goes through _execute_agent_step: when WORKFLOW_RUNNER_ENABLED is set AND task_type maps to a shipped workflow (only new_task→coding/new-task-v1 in Phase 1), the single run_agent step is dispatched through workflow.run_workflow — exercising the real handler registry, step milestones, and result threading — while clone, context assembly, and post-hooks stay on the proven inline path. Otherwise it is exactly the legacy asyncio.run(run_agent(...)) call. pr_iteration/pr_review have no workflow file until Phase 2b, so they fall back to inline. run_workflow gains an only_kinds filter so the seam drives just the run_agent step (no double clone / double PR against the inline post-hooks). The flag is flipped to default-on in the tasks 6-8 cutover. Folds in the group-B code-review fixes (effective when the seam is enabled): - system prompt: hydrate_context builds ctx.system_prompt via the existing build_system_prompt (was empty), reusing pipeline's logic rather than reimplementing it. - verify gate: new shared _gate_status gives verify_build/verify_lint matching semantics — informational/read_only never gate, regression_only consults build_before/lint_before (mirrors pipeline's build_ok = passed or not build_before), strict always gates. Fixes the regression_only build bug and the verify_lint read_only asymmetry. - clone_repo is idempotent (reuses a pre-populated ctx.setup). - StepContext threads the --trace trajectory into run_agent (was dropped). ensure_pr strategy reconciliation stays deferred to task 8 (it requires removing the task_type branch inside ensure_pr); recorded in local-docs. Full agent suite green (930 passed), ruff + ty clean. * fix(agent): address second-round review findings on the workflow seam (aws-samples#248 Phase 1) - Error-handling contract (highest severity): when the run_agent step's handler raises, run_workflow captures it into a failed StepOutcome rather than propagating. _execute_agent_step now RE-RAISES in that case so run_task's except block restores full fidelity — the log_error_cw APPLICATION_LOGS mirror (read by TaskDashboard / bgagent status), the span error, and the real exception text — instead of silently downgrading to a generic AgentResult. This also makes the previously-dead agent_result-is-None branch live. - _gate_status: an unset gate now defaults to regression_only semantics (gate only a regression; a pre-existing failure does not gate), matching pipeline.py which is unconditionally 'build_ok = passed or not build_before'. Previously an unset gate fell through to strict, contradicting the docstring's 'mirrors pipeline.py' claim and would wrongly fail a broken-before repo. - run_agent handler fails loud on an empty system prompt rather than running an unguided agent loop (repo-less prompt assembly is Phase 3; this turns the gap into an attributable failed step instead of a silent context-free run). - _workflow_id_for_task_type: documented as a transitional bridge tied to the canonical task_type→workflow table in WORKFLOWS.md / API_CONTRACT.md, to be kept in sync until tasks 6-8 delete it. Tests added for each behavior change. Full agent suite green (934 passed), ruff + ty clean. * feat(api,cli): replace task_type with workflow_ref/resolved_workflow (aws-samples#248 tasks 6-7) Breaking API change (no alias): task_type is removed end-to-end; workflow_ref selects a workflow and resolves to a pinned {id, version} at the create-task boundary. CDK/CLI types (task 6, in lockstep — check:types-sync): - Remove TaskType + isPrTaskType; add ResolvedWorkflow. - CreateTaskRequest.workflow_ref?; TaskRecord/TaskDetail/TaskSummary resolved_workflow (+ mappers). - CLI: --pr/--review-pr now set workflow_ref (coding/pr-iteration-v1, coding/pr-review-v1); new --workflow flag; format reads resolved_workflow.id. CDK resolution boundary (task 7): - New workflows.ts: CDK-side descriptor mirror of agent/workflows/** + resolveWorkflowRef (ladder: explicit ref → default/agent-v1), isValidWorkflowRef, requires_repo/read_only/uses_pr accessors. Drift-guard test keeps the table in sync with the shipped YAML. - validation.ts: drop VALID_TASK_TYPES/isValidTaskType; hasTaskSpec now takes the resolved workflow's required_inputs contract. - create-task-core.ts: resolve workflow, validate inputs against the contract, persist workflow_ref + resolved_workflow. Repo stays required for ALL workflows in Phase 1 (repo-less admission is Phase 3). - preflight.ts: taskType param → readOnly + requiresRepo; repo-less short-circuits to passed (seam for Phase 3). - orchestrate-task/context-hydration/orchestrator/ecs-strategy: derive read_only/requires_repo/PR-ness from resolved_workflow; payload swaps task_type → resolved_workflow. ecs bootCommand passes resolved_workflow (lockstep with agent run_task in task 8). CLI 294 tests + CDK suite green; check:types-sync OK. Agent cutover (task 8) lands next to consume resolved_workflow. * feat(agent): complete task_type→workflow cutover in the agent (aws-samples#248 task 8) Removes the Python TaskType enum / PR_TASK_TYPES / _PROMPTS-by-task_type and the gated seam; the workflow runner is now the sole agentic path, driven by the resolved_workflow {id, version} threaded from the create-task boundary. - models.py: delete TaskType enum; TaskConfig gains resolved_workflow, policy_principal, is_pr_workflow (drops task_type). - config.py: build_config takes resolved_workflow (not task_type); derives policy_principal via workflow.policy_principal_for and is_pr_workflow; PR_TASK_TYPES → PR_WORKFLOW_IDS. - Cedar UNCHANGED (Phase 2a owns the principal migration): the Agent::TaskAgent::"<id>" scheme and contracts/cedar-parity/ are untouched. policy_principal_for maps read_only⇒pr_review, else id→legacy {new_task, pr_iteration, pr_review}. Only policy.py edit: drop the WARN-only TaskType() validation. runner.py passes config.policy_principal as the principal. All test_policy* pass unchanged. - prompts rekeyed by workflow id (get_system_prompt falls back to the default coding prompt for an unknown id rather than raising). - pipeline.py: _execute_agent_step loads from resolved_workflow, always-on; the 3 task_type=='pr_review' branches → workflow.read_only; ensure_pr takes an explicit strategy (create/push_resolve/resolve) from the workflow step, resolving the deferred code-review item. - post_hooks.ensure_pr: strategy param replaces task_type self-inspection. repo.py: PR-branch resume keys off config.is_pr_workflow. - server.py/entrypoint.py: thread resolved_workflow; validation keyed on PR workflow ids. - New faithful Phase-1 workflow files coding/pr-iteration-v1 (push_resolve) + coding/pr-review-v1 (read_only, resolve, no Write/Edit per rule 4). All 4 first-party files validate; CDK descriptor drift-guard green. - docs: API_CONTRACT.md migration table + resolved_workflow responses; Starlight mirror synced. Agent suite 922 passed, ruff + ty clean. With tasks 6-7 (e829c4f) this completes the breaking cutover across API + CLI + agent. * fix(agent): use ty ignore syntax + annotate workflow test handlers (aws-samples#248) The full build was never completed on this branch; ty type-checking and ruff format gates were red. No src logic changes — type/format hygiene only. - Tests used mypy-style `# type: ignore[arg-type]`, but the project type checker is ty (`# ty: ignore[rule]`), so the suppression silently no-op'd. - test_workflow_runner.py: annotate inline step handlers as StepHandler-compatible `(step: Step, ctx: StepContext) -> StepOutcome`; type the `handlers` dicts as `dict[str, StepHandler]` (ty treats the alias params as positional-only → dict-value invariance rejects named-param funcs); convert the stale ignore to `# ty: ignore[invalid-argument-type]`. - test_entrypoint.py: assert `resolved_workflow is not None` before subscripting (TaskConfig.resolved_workflow is `dict | None`). - Apply ruff format reflows (runner.py, validator.py, test_workflow_tree_valid.py) the branch was committed without. * feat(agent): Cedar read-only enforcement keys off context.read_only (aws-samples#248 Phase 2a) Migrate read-only enforcement from the literal `Agent::TaskAgent::"pr_review"` principal match to the `context.read_only` attribute, so the Write/Edit hard-deny attaches to the *property* and fires for every read-only workflow uniformly — not just coding/pr-review. Removes the Phase-2b principal bridge. This is the security-load-bearing step: an error here *silently weakens* enforcement (the rule stops matching) rather than failing loudly. Guarded by new contracts/cedar-parity/ golden fixtures verified on BOTH the cedarpy (Python) and cedar-wasm (TypeScript) engines. Policy (both bindings, kept byte-for-byte identical — drift guard enforces): - agent/policies/hard_deny.cedar + cdk/.../builtin-policies.ts: pr_review_forbid_write/edit (principal == "pr_review") → read_only_forbid_write/edit (when context.read_only == true, any principal). Wiring: - policy.py: PolicyEngine gains read_only kwarg; threads read_only into every Cedar request context (probe + base_context). - models.py: TaskConfig.read_only; config.py derives it from the resolved workflow; runner.py passes config.read_only to PolicyEngine. - workflow/loader.py: remove the read_only ⇒ "pr_review" bridge in policy_principal_for — the principal is now an identity/audit tag only; pr-review keeps its own id-derived principal. Parity fixtures (new): read-only-forbid-write, read-only-forbid-edit (read_only=true ⇒ deny), read-only-false-permits-write (read_only=false ⇒ base_permit — proves the deny is gated on the property, not always-on). Tests: TestPrReviewPermissions → TestReadOnlyPermissions, now asserting the deny fires for any read-only principal AND that read_only=false permits Write; test_hooks + test_config updated. agent 927 / cdk 1822 green; ty + ruff clean. Docs: ADR-014 addendum (2026-06-08) records dropping the isolated-PR/ordering requirement (2b shipped first behind the bridge, so read-only was never unprotected); WORKFLOWS.md §"Replacing the Cedar principal" + phasing table updated to past tense. * docs(workflows): resolve repo-optional open questions + freeze schema (aws-samples#248) ADR-014 addendum (2026-06-08) settles the two open questions that gated the Phase-0 schema freeze (WORKFLOWS.md open questions #1, #2). With the one implied schema reshape applied, the schema is frozen — later phases add handlers and plumbing, not schema fields. Decision 1 — Memory actorId for repo-less tasks: per-user user:{cognito_sub}. Caller-scoped (no cross-tenant bleed; mirrors the per-user trace prefix); cross- workflow pooling explicitly not adopted. NO schema field — a fixed platform fallback, applied in memory.py in Phase 3 when repo is absent. Decision 2 — Artifact delivery: named Python deliverers (open target string), shared S3 plumbing pinned now. deliver_artifact.target resolves to a registered deliverer (workflow/deliverers.py → DELIVERERS), same pattern as STEP_HANDLERS — a new delivery method is a registered deliverer, not a schema change. Plumbing frozen: task-scoped key artifacts/{task_id}/, prefix-scoped SessionRole grant, size limit, TaskDetail URL surfacing, workflow:{id} repo-less tenant tag. Schema reshape applied: - workflow.schema.json: steps[].target drops its enum (stays type:string, minLength:1); the closed set moves into the deliverer registry. - models.py: DeliverTarget Literal → str. - New workflow/deliverers.py: Deliverer dataclass + DELIVERERS registry + produced_outcomes(); single source of truth for "what each target produces." - validator.py rule 11: consults the registry (_deliverer_produces) instead of the hardcoded _DELIVER_TARGET_OUTCOMES map. The three first-party names keep their exact produced-outcome sets, so NO existing workflow / fixture / golden vector changes verdict; an unknown deliverer name now produces nothing (new guard, tested). Tests: test_workflow_deliverers.py (registry contract) + a rule-11 unknown-deliverer case. agent 932 green; ty + ruff clean. deliver_artifact remains a NotImplementedError stub until Phase 3 — only the contract is frozen. * feat(api,agent): repo-optional admission for repo-less workflows (aws-samples#248 Phase 3) First slice of Phase 3 (repo-optional tasks): the admission path now accepts a submission with no repo when the resolved workflow does not require one, so a repo-less knowledge workflow (e.g. default/agent-v1) can be admitted, hydrated, and run from task_description alone. The deliverer/S3 + agent-runtime clone-skip slices follow separately. Semantics: requires_repo decides whether a repo is MANDATORY; requires_repo:false means repo-OPTIONAL (a repo-less workflow may still target a repo). The repo-less behavior keys off repo *absence*, not the workflow flag — a repo-optional workflow given a repo still takes the repo-bound path. Types (CDK + CLI mirror, kept in sync — check:types-sync green): - CreateTaskRequest.repo, TaskRecord.repo → optional; TaskDetail.repo, TaskSummary.repo → string | null. Mappers null-coalesce. Admission (create-task-core.ts): resolve the workflow FIRST, then gate repo on workflow.requiresRepo (missing repo on a repo-bound workflow → 400). Onboarding + blueprint-cap lookup runs only when a repo is present; repo-less takes the platform default cap. Record omits repo when absent; replay-required-fields drops 'repo'; event metadata null-coalesces. Hydration (context-hydration.ts): new repo-less branch (keyed on !task.repo) that assembles from task_description + per-user memory, returning before the repo-bound fetch path; assembleUserPrompt drops the Repository: line when repo is absent. Memory (memory.ts + orchestrator.ts): loadMemoryContext/writeMinimalEpisode take an actorNamespace (repo for coding, user:{user_id} for repo-less — ADR-014 addendum). Preflight already honored requiresRepo (Phase-1 seam) — widened its repo param to string|undefined. fanout-task-events skips the GitHub channel for a repo-less task. loadBlueprintConfig skips the RepoTable lookup when repo is absent. Agent (config.py): build_config loads the workflow up-front and requires repo_url/github_token only when requires_repo; repo-less runs from task_description. CLI shows '—' for a repo-less task. Tests: repo-less acceptance (create-task-core, create-task, context-hydration, CLI format) + repo-bound-missing-repo rejection; memory test log key repo → actor_namespace. agent 932 / cdk 1825 / cli 296 green; ty + ruff + eslint clean. deliver_artifact is still a NotImplementedError stub — a repo-less task can be admitted and hydrated but not yet deliver an artifact (next slice). * feat(agent): repo-less pipeline path via the workflow runner (aws-samples#248 Phase 3) Second Phase-3 slice: a repo-less workflow now runs the agent loop in-container with no clone / build / PR. When config.requires_repo is false, run_task delegates to _run_repoless_task, which drives the workflow's steps (hydrate_context → run_agent) through the workflow step runner — coding vs knowledge now differ by the workflow's steps list, per ADR-014's end-state — and assembles a terminal TaskResult with pr_url=None. Scope boundary (deliberate): only hydrate_context + run_agent are driven here. deliver_artifact is the workflow's terminal step but its handler is a stub until the artifacts-bucket S3/IAM contract ships (next slice), so it is excluded via only_kinds; delivery is deferred, the agent run is not. The coding path is left byte-identical — its hard-won inline cancel-safety (skip ensure_pr on a cancelled task) is NOT yet runner-driven, so routing the full coding step list through the runner is a separate cancel-aware change, intentionally not done here. - models.py: TaskConfig.requires_repo (defaults True; coding unaffected). config.py build_config sets it from the resolved workflow. - pipeline.py: requires_repo branch before the repo-coupled segment; _run_repoless_task runs the runner + assembles/persists the terminal result. run_task repo_url now defaults to "" (repo-less callers omit it). - prompts/default_agent.py: repo-less system prompt (no git/branch/PR placeholders); registered for default/agent-v1. - prompt_builder.build_repoless_system_prompt + shared _render_memory_context. Tests: repo-less pipeline run (no setup_repo, no PR, success, repo-less prompt); repo-less prompt has no repo placeholders. agent 934 green; ty + ruff clean. * fix(agent): address PR-review findings on Phase 2a/3 (aws-samples#248) Findings from the plugin review (code-reviewer, silent-failure-hunter, pr-test-analyzer, type-design-analyzer) on 4c34c8b..HEAD. HIGH — repo-less task reported COMPLETED while delivering nothing (silent-failure-hunter). _run_repoless_task skips deliver_artifact (stub until the artifacts contract ships), but the workflow's terminal outcome (e.g. `comment`) was never produced, yet status was success. Now: if the primary terminal outcome is delivery-backed and delivery was skipped, the task is a loud FAILED with an explicit error + delivery_deferred milestone naming the gap. MEDIUM — config.py load-failure fallback bug + fail-open (code-reviewer, silent-failure-hunter). The WorkflowValidationError branch compared workflow_id against DEFAULT_WORKFLOW_ID (the *coding* default) for requires_repo, contradicting its comment, and defaulted read_only=False (fail-open) for unknown ids. Now: log the failure (was silent); fail CLOSED — read_only=True for any id not in _KNOWN_WRITEABLE_WORKFLOW_IDS; requires_repo keyed off the real repo-less default (REPO_LESS_DEFAULT_WORKFLOW_ID = default/agent-v1). MEDIUM — TaskConfig missing cross-field invariant (type-design-analyzer). Added _validate_requires_repo_has_repo (mirrors _validate_trace_requires_user_id): requires_repo=True ⇒ non-empty repo_url, enforced at construction. repo_url and github_token now default to "" so a repo-less TaskConfig is constructible; the validator preserves the repo-bound invariant. LOW/secondary — rule-8 now also checks deliver_artifact.target resolves in DELIVERERS (type-design-analyzer), so an unknown target is caught universally, not only when it collides with the primary outcome. Test gaps closed (pr-test-analyzer): repo-OPTIONAL workflow given a valid repo (admission + hydration); repo-less memory keyed user:{user_id} (asserted); malformed-repo-on-repo-bound rejection; preflight repo-less short-circuit + repo-bound-no-repo invariant; repo-less agent_result=None → FAILED; config load-failure fail-closed; TaskConfig validator; rule-8 deliver target. agent 941 / cdk 1831 / cli 296 green; ty + ruff + type-sync clean. * feat(agent,api): deliver_artifact + repo-less memory — close criterion aws-samples#4 (aws-samples#248 Phase 3) Final Phase-3 slice: a repo-less knowledge task now runs end-to-end — admitted, hydrated, agent loop, AND delivers its output. Closes acceptance criterion aws-samples#4 (a non-coding task runs without a repo) for the s3/comment deliverers. Infra (IAM + env): - agent-session-role.ts: s3:PutObject grant scoped to artifacts/${aws:PrincipalTag/task_id}/* on the trace-artifacts bucket (mirrors the traces/ pattern; task_id-scoped per ADR-014 addendum). cdk-nag reason updated. - agent.ts: ARTIFACTS_BUCKET_NAME runtime env (same bucket as traces). Deliverers (agent/src/workflow/deliverers.py): - DELIVERERS gains a deliver(target, ctx) dispatcher + DeliveryResult. The s3 deliverer uploads the agent's result text to artifacts/{task_id}/result.md via the tenant-scoped S3 client (5 MiB cap); the comment deliverer records a delivered_comment milestone for the channel fanout; s3_and_comment does both. - AgentResult.result_text captures ResultMessage.result (the knowledge-task deliverable); runner.py populates it on success. - runner._handle_deliver_artifact: now implemented (was NotImplementedError), routes through deliver(); a delivery failure is a failed step (terminal FAILED), never a silent skip. Pipeline (pipeline.py): - _run_repoless_task drives the FULL step list (deliver_artifact included); replaces the delivery-deferred FAILED gate with the real runner verdict (agent succeeded but workflow failed ⇒ attributed FAILED). Sets TaskResult.artifact_uri. - Repo-less episodic memory write keyed user:{user_id} (fail-open). Memory (memory.py): write_task_episode repo→actor param; _validate_actor accepts owner/repo OR user:{id} so the same write path serves coding + knowledge tasks. Types: TaskResult/TaskRecord/TaskDetail + CLI mirror gain artifact_uri (mirrors trace_s3_uri); CLI detail view shows an Artifact: line. check:types-sync green. Tests: deliverer dispatch (s3 upload + key/body, comment milestone, s3_and_comment, missing bucket, empty text, oversize); repo-less pipeline now succeeds + sets artifact_uri=null for comment-only; session-role artifacts grant; deliver_artifact dropped from the fail-loud stub list. agent 947 / cdk 1832 / cli 296 green; synth + cdk-nag clean; ty + ruff + type-sync clean. * feat(agent): make repo-less delivery retrievable + ship a knowledge workflow (aws-samples#248 Phase 3) Three should-fix items closing the gap where "criterion aws-samples#4" was met by machinery but the default repo-less path delivered to a channel that isn't wired. 1. Stale docstring: _run_repoless_task said deliver_artifact was a deferred stub — it ships now. Updated to describe the real full-step-list run. 2. Default workflow delivers retrievably. The channel fan-out only knows slack/email/github; an api-origin repo-less task (the common default case) has no channel, so a `comment`-only deliverer surfaced nothing the user could get. default/agent-v1 now uses `target: s3_and_comment` (primary outcome `artifact`): the S3 upload to artifacts/{task_id}/ is always retrievable, the comment milestone is still rendered by the fan-out when a channel exists. 3. Ship a runnable knowledge workflow: agent/workflows/knowledge/web-research-v1.yaml — the concrete repo-less demonstration (research → S3 artifact, no repo). Built-in capabilities only (registry MCP/skills are Phase 4); CDK descriptor mirror + drift-guard + resolver tests updated. Also fixes a latent bug the knowledge workflow surfaced: a repo-less id with no registered prompt fell back to the CODING prompt (leaking unsubstitutable {repo_url}/{branch_name}). get_system_prompt gains repo_less= so repo-less workflows fall back to the repo-less default-agent prompt instead. Tests: repo-less pipeline now asserts artifact_uri set (s3_and_comment); repo-less prompt fallback; knowledge workflow validates/resolves. agent 949 / cdk 1833 green; ty + ruff + type-sync clean; docs synced. * fix(agent): address full-branch review — 2 repo-less blockers + cleanups (aws-samples#248) Findings from the four-reviewer full-branch pass. Two were blockers that would have shipped the repo-less feature broken despite green unit tests (the tests didn't cross the server/task_state boundaries). BLOCKER 1 — repo-less task rejected on the AgentCore backend. server.py _validate_required_params required repo_url unconditionally, so /invocations returned 400 for every repo-less task before the pipeline ran (ECS backend dodged it by calling run_task directly). Now gates repo_url on the workflow's requires_repo (resolved via load_workflow; fails SAFE — assume required — on a load error). Test: repo-less payload accepted, repo-bound still requires repo. BLOCKER 2 — artifact_uri computed but never persisted. task_state.write_terminal's field allowlist had no artifact_uri branch, so the URI was dropped before DynamoDB and TaskDetail.artifact_uri was always null (delivery not retrievable despite the S3 object existing). Added the persist branch + tests. Secondary: - comment deliverer no longer overstates: _post_comment docstring + WORKFLOWS.md state plainly that delivered_comment is recorded for the event stream (visible in `bgagent watch`) but is NOT yet routed to an external channel (not in the fan-out ROUTABLE_MILESTONES). comment_posted=True only when actually recorded. - test pins the config→engine read_only seam (runner.py PolicyEngine(read_only=)): dropping it now fails a test instead of silently disabling the Write/Edit deny. - stale phase-boundary comments corrected (post_review docstring, runner module docstring + only_kinds + run_agent guard, pr-review/pr-iteration YAML headers) — they described Phase 2a/2b/3 as pending when those shipped on this branch. - ADR-014: replaced the dangling pre-rebase commit hash 838c72e (unreachable after rebase) with a descriptive reference. agent 954 / cdk 1833 green; ty + ruff + type-sync clean; docs synced. * test(cli): cover artifact_uri render lines (aws-samples#248 Phase 3) The Artifact: display lines added to formatTaskDetail / formatStatusSnapshot introduced uncovered truthy branches that dropped CLI branch coverage below the 84% gate (full `mise run build` caught it; the per-package run had passed on the prior content). Adds non-null + null cases for both renderers, mirroring the existing trace_s3_uri tests. cli 300 passed; full build green. * fix(agent): address PR review findings + green CI (aws-samples#248) CI: ruff-format the 4 files that tripped the build self-mutation (config.py, deliverers.py, test_pipeline.py, test_task_state.py) so "Fail build on mutation" passes. Review findings: - prompts/__init__.py: get_system_prompt now logs WARN when an id has no registered prompt and falls back, restoring the signal the old build_system_prompt emitted (silent-failure-hunter). - pipeline._run_repoless_task: chain wf_result.failed_step.error into the synthesized AgentResult so a run_agent failure on the repo-less path is diagnosable instead of a generic message (code-reviewer). - server._validate_required_params: narrow the broad `except Exception` around load_workflow to WorkflowValidationError/ImportError so a genuine programming error surfaces loudly instead of being masked as "could not resolve requires_repo" (still fails safe — repo required). - workflows.ts workflowIsReadOnly: document why the unknown-id default is `false` (conservative admission: needsWrite=!readOnly) and must NOT be "aligned" to the agent-side fail-closed read-only default. - runner.py / pr-review-v1.yaml: comment fixes — artifact_key→artifact_uri, drop never-written review_posted, rule 4→6, post_review handler is registered-but-unimplemented not "no shipped handler" (comment-analyzer). Tests (pr-test-analyzer gaps): - test_memory.py: cover _validate_actor (user:{id} namespace + rejects) and a write_task_episode(actor="user:...") reaching create_event. - test_pipeline.py: cover the repo-less delivery-gate-failure branch (agent succeeds but deliver_artifact fails → terminal FAILED naming the step). Docs: ROADMAP.md "Task types" now reflects workflow-driven tasks + repo-less knowledge workflows; Starlight mirror regenerated. agent 962 passed, ruff+ty clean, cdk compile+workflows tests green. * fix(agent): ship first-party workflow files in the image + sync user docs (aws-samples#248) The Dockerfile copied agent/src, agent/policies, and contracts but never agent/workflows, so the runtime loader (which resolves /app/workflows/<domain>/ <name>-vN.yaml) failed every workflow load with WorkflowValidationError. Add COPY agent/workflows/ /app/workflows/ so the five shipped workflows + the JSON schema land in the image. Verified in a local build: load_workflow( 'coding/new-task-v1') resolves to 1.0.0. Also bring the user-facing guides in line with the workflow-driven model: USER_GUIDE/QUICK_START now document workflow_ref, the --workflow flag, and resolved_workflow responses (the retired task_type is shown only as a migration mapping). Rename the "Task types" section to "Workflows", update the Starlight sync map + sidebar slug, and regenerate the mirrors. * fix(agent,api,cli): address PR-review findings on the workflow seam (aws-samples#248) Resolve five verified findings from the aws-samples#248 review: HIGH — thread agent_config.allowed_tools to the SDK. runner.py hardcoded the full tool surface, so a workflow's declared allowed_tools never reached ClaudeAgentOptions and the read_only:false read-leaning lanes (default/agent-v1, web-research-v1) could still get Write/Bash. Add TaskConfig.allowed_tools (populated from the resolved workflow), resolve it in run_agent via _resolve_allowed_tools (drops Write/Edit when read_only), and add SDK-layer tests. Cedar missing-read_only fail-open — add the ADR-014-promised parity vector. With context omitting read_only, both engines fail OPEN (Allow + base_permit); safety rests on policy.py::_eval_tier re-raising on diagnostics.errors. Add contracts/cedar-parity/read-only-missing-attribute.json (verified on cedarpy AND cedar-wasm) plus test_policy.py tests pinning the re-raise + always-inject discipline. MEDIUM — repo-less artifact success gate. pipeline.py now fails the task when the primary outcome is `artifact` but no artifact_uri was produced (matches the WORKFLOWS.md "agent-success AND S3 key present" contract), not just when the deliver step raised. MEDIUM — rule 13 model allow-list admission. Add WORKFLOW_MODEL_ALLOWLIST + disallowedWorkflowModel() and enforce it in create-task-core (unpermitted model => 400, no silent downgrade); descriptor sync-test asserts modelId matches the YAML and stays on the allow-list. LOW — CLI repo-less submission. --repo is no longer a hard requiredOption: required unless --workflow selects a repo-less workflow (still required with --pr/--review-pr). Lets bgagent reach knowledge/web-research-v1. Rule 10 (one production version per lineage) remains intentionally unenforced at the single-file layer — it is a registry/promotion-time property. * fix(agent,api,cli): address PR-review aws-samples#296 resolution-ladder + deliverer findings (aws-samples#248) Resolve the remaining findings from isadeks's review (findings #2/aws-samples#4 were already fixed in the prior commit): #1 — bare submit silently stopped opening PRs. The server ladder resolves an absent workflow_ref to the repo-less default/agent-v1 (by design), so a CLI `submit --repo X --task Y` regressed from "opens a PR" to "S3 artifact". The CLI now sends coding/new-task-v1 explicitly when a repo is present and no workflow/ pr flag is given, preserving the old new_task default. aws-samples#3 — repo-optional + repo disagreed across the boundary. The agent took the repo-less path off config.requires_repo alone, while the orchestrator built a repo-bound prompt off repo presence. pipeline.run_task now takes the repo-less path only when requires_repo is false AND no repo was supplied; a repo-optional workflow given a repo runs the repo-bound path (clone/build/PR), matching CDK admission and the prompt. aws-samples#6 — resolveWorkflowRef silently discarded a @Version constraint. It now honors the constraint (Phase 1: must match the single shipped version) and returns null otherwise; create-task-core 400s with a distinct "version not available" message via resolveWorkflowRefError, instead of quietly running the shipped version. aws-samples#7 — deliver_artifact unset-target default disagreed between validator (lenient full set) and runtime (s3). Introduce DEFAULT_DELIVER_TARGET as the single source of truth; produced_outcomes(None) now models that exact default, so a primary:comment workflow that omits target is correctly flagged by rule 11. aws-samples#10 — descriptor/YAML drift had no parity test. The sync test now asserts required_inputs (allOf/oneOf) parity in addition to id/version/requires_repo/ read_only/model, and a new test cross-checks the agent-side _KNOWN_WRITEABLE_WORKFLOW_IDS (the third hand-maintained copy) against the writeable repo-bound descriptors so drift fails CI. * fix(agent): close PR-review aws-samples#296 follow-ups — post-hook fail-soft, prompt, cap, parity (aws-samples#248) aws-samples#5 (High) — post-hook load_workflow had no fail-soft fallback, so a reload failure AFTER run_agent mutated/committed the tree stranded the work as FAILED with no PR. read_only now comes from config (already fallback-computed and the verdict that drove Cedar); the reload — needed only for the ensure_pr strategy — is wrapped in the same WorkflowValidationError fallback build_config uses, defaulting to "create" so the PR still opens. aws-samples#8 — knowledge/web-research-v1 had no registered prompt and silently degraded to the generic default-agent prompt. Add a research-specialized, repo-less prompt (prompts/web_research.py) and register it. aws-samples#9 — the 5 MiB artifact cap was checked AFTER encoding to UTF-8 (a second full copy). Reject on the character count up front (chars ≤ bytes in UTF-8) so the cap actually bounds memory on the MicroVM before materializing the bytes. aws-samples#10 residual — add a parity test for the agent-side REPO_LESS_DEFAULT_WORKFLOW_ID / DEFAULT_WORKFLOW_ID constants: the repo-less default must match the CDK platform default and be requires_repo:false in the YAML; the coding default must be repo-bound. Closes the last hand-maintained-copy axis aws-samples#10 named. Perf — memoize load_workflow (@cache): files are image-baked and immutable per process and Workflow is frozen, so the 3-4 loads per task now parse once. Leaves the coding-lane decorative-`gate` item (run_workflow only drives run_agent while verify_build/ensure_pr stay inline) as a tracked follow-up — it is a larger half-migrated-runner refactor both reviewers flagged as non-blocking. --------- Co-authored-by: bgagent <bgagent@noreply.github.com>
Summary
Introduces a
ComputeStrategyabstraction that decouples the orchestrator from any specific compute backend. Implements two strategies behind this interface:compute_type: 'ecs'in blueprint configKey changes
ComputeStrategyinterface (compute-strategy.ts):startSession,pollSession,stopSessionwithSessionHandlefor durable serializationEcsComputeStrategy(strategies/ecs-strategy.ts): RunTask with container command override to invokeentrypoint.run_task()directly in batch mode (bypasses the uvicorn server that would otherwise sit idle)EcsAgentClusterCDK construct (ecs-agent-cluster.ts): ECS cluster, Fargate task definition (2 vCPU / 4 GB ARM64), security group (egress 443 only), CloudWatch log group, scoped IAM rolesorchestrate-task.ts): Returns fullSessionHandlefrom start-session step; adds ECS-level crash detection in the polling loop alongside existing DDB pollingTaskOrchestratorconstruct: Conditional ECS env vars, IAM policies (ecs:RunTask/DescribeTasks/StopTask, scopediam:PassRole), all-or-nothing prop validationAgentStackwiring: ECS infrastructure gated behindABCA_ENABLE_ECS=trueenv flag — zero impact on existing deploymentsTesting
ABCA_ENABLE_ECS=true, submitted a task withcompute_type: 'ecs'blueprint, agent completed in ~5 min and opened a PR via ECS FargateTest plan
mise run buildpasses (compile, test, synth, lint, docs)ABCA_ENABLE_ECS=trueproduces valid template